Testing AI Tools: Consensus

With the "Deus Ex Mach­ina? — Test­ing AI Tools"-series we want to show you dif­fer­ent tools, that aim to sim­pli­fy writ­ing, design and research by using Arti­fi­cial Intel­li­gence. More on the "Deus Ex Machina?"-series can be found here.

Overview

It sounds like the dream of many stu­dents and sci­ent­ists: Simply typ­ing in a ques­tion and being told with­in seconds how sci­entif­ic research would answer it, without hav­ing to work your way through long, highly com­plex papers. But this dream seems to have come true — at least if you believe the developers of the AI tool Con­sensus. Con­sensus claims to be just that: an AI-based search engine that makes the world's aca­dem­ic know­ledge more access­ible to its users.

The the­ory behind the tool is simple: you type in a ques­tion — just like in oth­er search engines — and receive answers in nat­ur­al lan­guage. How­ever, these are not gen­er­ated from ran­dom sources, such as the most clicked web­sites on a top­ic, but only from the best answers that sci­ence cur­rently has to offer. Con­sensus claims to cov­er over 80 dif­fer­ent sci­entif­ic fields.

The basis for the gen­er­ated res­ults is the Semant­ic Schol­ar Data­base, in which over 200 mil­lion sci­entif­ic papers — stud­ies and the­or­et­ic­al texts — can be found. Accord­ing to Con­sensus, the data­base is con­stantly being expan­ded. Thanks to this found­a­tion and its own LLMs and search tech­no­lo­gies, Con­sensus — unlike Chat­G­PT, for example — does not provide its users with the most likely answer, but the most sci­en­tific­ally cor­rect one.

The answers are presen­ted in nat­ur­al lan­guage and based on spe­cif­ic papers, which can also be viewed and from which dir­ect quotes are dis­played. The 20 most suit­able papers are used for answer­ing each question.

The basic func­tion of Consensus

So much for the basic prin­ciple of the tool. In addi­tion, the search can be sim­pli­fied and refined by numer­ous add-ons. Con­sensus, for example, offers a sum­mary of the ten best papers, which in turn provide an answer to the ques­tion posed. For ques­tions that allow clear yes/no answers, the so-called Con­sensus Meter can also be used to quickly see wheth­er there is a sci­entif­ic con­sensus on a top­ic or wheth­er it is more con­tro­ver­sial. The Con­sensus copi­lot breaks down the ques­tion posed into indi­vidu­al core ele­ments or core top­ics and answers these — with ref­er­ences — on the basis of the papers used.

The sum­mary and Con­sensus Meter
The Con­sensus Copilot

Con­sensus also offers a so-called study snap­shot. This allows cent­ral ele­ments of the study design, such as the meth­ods, sample size, invest­ig­ated group, etc., to be viewed at a glance, allow­ing quick con­clu­sions to be drawn about the sig­ni­fic­ance of the study.  In our test, how­ever, the study snap­shot only provided inform­a­tion for around half of the res­ults provided. Inform­a­tion was miss­ing for the rest.

The study snapshot

You can also refine your search with Con­sensus by using spe­cif­ic fil­ters that allow the pro­gram to out­put only selec­ted papers. For example, you can fil­ter by sample size, meth­od­o­logy, open access pub­lic­a­tions, study design and more. Con­sensus also provides the responses with its own qual­ity indic­at­ors, allow­ing a focus on the best papers. For instance, the num­ber of cita­tions, the qual­ity of the journ­al in which the paper was pub­lished or the study type are evaluated.

These func­tion­al­it­ies already make it clear that Con­sensus focuses primar­ily on nat­ur­al sci­ence and sci­entif­ic research and is adap­ted to this sci­entif­ic sys­tem accord­ingly. The com­pany does state that it also includes non-sci­entif­ic sub­jects and in our test we also received answers to ques­tions relat­ing to the human­it­ies. How­ever, it is clear that the tool reaches its lim­its the less empir­ic­ally a sci­ence works. Con­sensus is trained to provide pre­cisely for­mu­lated answers, which can rarely be giv­en in human­it­ies research. For smal­ler research areas, such as rhet­or­ic, the res­ults are even less mean­ing­ful. This is where the gaps in the data­base become appar­ent. The same applies to sub­jects that do not primar­ily pub­lish in Eng­lish, such as nation­al philologies.

Con­sensus can be used free of charge after regis­ter­ing via a Google account or email address, but the in-depth search options are very lim­ited and users only have access to a small num­ber of ques­tions, which are answered with Con­sensus Meter, Sum­mary and Copi­lot. If you want per­man­ent access to the full scope, you have to take out a sub­scrip­tion. The com­pany was foun­ded in 2021 by Chris­ti­an Salem and Eric Olson, both alumni of North­west­ern Uni­ver­sity in Illinois. Accord­ing to the founders, this is also where the idea for Con­sensus was born: Using AI to make sci­ence more access­ible to every­one. Con­sensus was finally launched at the end of 2022, shortly before the release of ChatGPT.

The AI behind the application

Infobox: RAG

Retriev­al Aug­men­ted Gen­er­a­tion (RAG) is a pro­cess that increases the reli­ab­il­ity of Large Lan­guage Mod­els (LLM), makes the res­ults more spe­cif­ic and elim­in­ates undesir­able side effects such as hal­lu­cin­a­tions. Usu­ally, an LLM draws the know­ledge with which it oper­ates and responds to prompts from the data set with which it was trained. The know­ledge is gen­er­ated impli­citly and is usu­ally suf­fi­cient to answer gen­er­al ques­tions. How­ever, if the prompts require spe­cif­ic know­ledge, an LLM may answer unspe­cific­ally or invent know­ledge, an effect that is widely known among research­ers as “hal­lu­cin­a­tion”. With RAG, addi­tion­al sources of know­ledge are added to the LLM, which it can access and as a res­ult is no longer depend­ent on the impli­citly gen­er­ated know­ledge from the train­ing data. In this way, RAG allows spe­cif­ic know­ledge to be fed into an LLM, mak­ing it more reliable.

But how exactly does the tool work? Con­sensus con­tains vari­ous AI applic­a­tions that sup­port the tool. The developers them­selves describe how Con­sensus works as an assembly line. More than 25 dif­fer­ent Large Lan­guage Mod­els (LLM) work togeth­er at vari­ous stages of the pro­cess to out­put the final res­ults. Con­sensus also oper­ates with addi­tion­al vec­tor and keyword searches, which gen­er­ate spe­cif­ic metadata.

In addi­tion to clas­sic Retriev­al Aug­men­ted Gen­er­a­tion (RAG), which makes Consensus's res­ults more reli­able and val­id, the com­pany is also pur­su­ing a new­er approach. Before spe­cif­ic data is retrieved, addi­tion­al metadata is first gen­er­ated that could be use­ful in the tool's fur­ther work pro­cess. To a cer­tain extent, this is a reversal of RAG, i.e. gen­er­a­tion aug­men­ted retriev­al, as the com­pany itself states.

This com­plex inter­play of tech­no­lo­gies ranks the avail­able sources from the data­bases and ulti­mately only uses those that are con­sidered the best accord­ing to spe­cif­ic cri­ter­ia (study design, pub­lic­a­tion date, journ­al, num­ber of cita­tions, etc.). For indi­vidu­al func­tions, such as sum­mar­iz­ing, Con­sensus uses GPT4 from OpenAI.

The rhetorical potential of the tool

Con­sensus is par­tic­u­larly effect­ive for the aspect of rhet­or­ic known as logos — per­sua­sion through sound, sub­stant­ive argu­ments. Argu­ments are only cred­ible and there­fore effect­ive if they can be sub­stan­ti­ated. Sound reas­ons can be found in sci­entif­ic find­ings and stud­ies and argu­ments can thus be built con­vin­cingly. Con­sensus facil­it­ates access to these find­ings and sum­mar­izes sci­entif­ic con­sensus pre­cisely. If you take the res­ults of the Con­sensus Meter from the example above as an example, an argu­ment could look like this: “A vegan diet is health­i­er for humans than an omni­vor­ous diet, as con­firmed by more than 70% of cur­rent stud­ies on the topic.” 

This is much more con­vin­cing than a simple: “Accord­ing to cur­rent stud­ies, a vegan diet is health­i­er for humans than an omni­vor­ous diet.” This is because fig­ures and data cre­ate evid­ence effects (rhet­or­ic­ally evid­en­tia) and con­vince the address­ee on a dif­fer­ent level, as con­crete facts can be referred to.

Of course, you don't neces­sar­ily need Con­sensus to build such argu­ments and incor­por­ate sci­entif­ic res­ults into them, but the tool saves a good deal of research time. Find­ing the argu­ments — called inven­tio — is much faster thanks to Con­sensus. How­ever, the tool also offers advant­ages in the eloc­u­tio, the for­mu­la­tion of the mater­i­al, as Con­sensus already does this in part by out­put­ting its res­ults in nat­ur­al lan­guage. Here, how­ever, the tool lacks the address­ee ori­ent­a­tion that is essen­tial in rhet­or­ic and the main­ten­ance of appro­pri­ate­ness (aptum). Con­sensus always out­puts its answers in the same style. Nev­er­the­less, com­mu­nic­a­tion must be adap­ted to the tar­get group depend­ing on who it is inten­ded to reach. For example, the same con­tent must be for­mu­lated very dif­fer­ently depend­ing on wheth­er it is inten­ded to reach chil­dren, aca­dem­ics, a spe­cial­ist audi­ence, skep­tics, read­ers of a news­pa­per or social media users. Con­sensus can­not offer this cus­tom­iz­a­tion, so the final for­mu­la­tion must be done by the users of the tool themselves.

Usage in Science Communication

Con­sensus offers a wide range of applic­a­tions for sci­ence com­mu­nic­a­tion. It provides its users with a good over­view of spe­cif­ic ques­tions in spe­cif­ic fields of research, provides them with fur­ther sources, core theses and a present­a­tion of the pre­vail­ing sci­entif­ic opin­ions. This enables people from out­side the field, such as journ­al­ists, to gain deep insights into a research field in a short time and without any research effort. In turn, sci­ence com­mu­nic­at­ors can gen­er­ate high-qual­ity con­tent that is based on sci­entif­ic facts. Con­sensus pre­pares the sci­entif­ic find­ings in a clear way, which in turn makes it easi­er for sci­ence com­mu­nic­at­ors to under­stand and pass on the content.

The tool's high stand­ards and the focus on the best and most sci­entif­ic res­ults — accord­ing to spe­cif­ic cri­ter­ia — also pre­vent poor-qual­ity research or even mis­in­form­a­tion from find­ing its way into sci­ence com­mu­nic­a­tion and being disseminated.

How­ever, the spe­cif­ic search set­tings of the tool also repro­duce com­mon biases in the sci­entif­ic sys­tem and rein­force power struc­tures. Mar­gin­al­ized groups and minor­it­ies thus have less chance of being heard if, for example, their papers are not even used for the res­ults of Con­sensus because they have been cited too rarely. The fact that Con­sensus oper­ates primar­ily in Eng­lish and with Eng­lish sources also plays a major role here. Yet, this is a gen­er­al phe­nomen­on in sci­ence, which pre­dom­in­antly takes place in English.

In addi­tion to the focus on the Eng­lish-speak­ing world, it is also clearly vis­ible, as already men­tioned, that Con­sensus does not cov­er all sci­ences equally, but primar­ily the nat­ur­al and engin­eer­ing sci­ences. When it comes to ques­tions in the human­it­ies, for example, the res­ults are thin­ner and there­fore less reli­able. The data­base only includes digit­ized know­ledge and there­fore has numer­ous blank spots in research.

This makes Con­sensus suit­able for use in sci­ence com­mu­nic­a­tion, but not without restric­tions and not equally for all fields of research.

Wrap Up

Con­sensus is a remark­able tool that enables its users to have ques­tions answered in a sci­en­tific­ally sound, clear and under­stand­able way with­in a very short time. It has numer­ous func­tions that provide a good over­view of sci­entif­ic con­sensus and the core con­tent of indi­vidu­al research fields, mak­ing it a poten­tial aid in every­day life for experts and non-spe­cial­ists alike. How­ever, the tool also has its lim­it­a­tions in that it maps indi­vidu­al research areas bet­ter than oth­ers, rein­forces pre­vail­ing biases and power struc­tures and is not very flex­ible. It remains to be seen to what extent the developers will con­tin­ue to expand Con­sensus and wheth­er this will lay the found­a­tion for new sci­entif­ic work togeth­er with AI.