Visit the Caliper Life Sciences web site
Click on the advert above to visit the company web site

Product category: Laboratory and scientific training and education
News Release from: Indiana University School of Informatics | Subject: Inauthentic Paper Detector
Edited by the Laboratorytalk Editorial Team on 24 April 2006

Method to test for phony technical
papers

Request your FREE weekly copy of the Laboratorytalk email newsletter. News about Laboratory and scientific training and education and more every issue. Click here for details.

Authors of bogus technical articles beware: A team of researchers at the Indiana University School of Informatics is designing a tool to distinguish between fake and real papers

It's called the Inauthentic Paper Detector - one of the first of its kind anywhere - and it uses compression to determine whether technical texts are generated by machine whose intent is to deceive, or by humans "This is a potential problem since no existing systems, the web for example, can or do discriminate between content that is meaningful or bogus," says assistant professor Mehmet Dalkilic, a data mining expert

"We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning." Joining Dalkilic on the IPD project are assistant professor Predrag Radivojac; informatics doctoral student James Costello; and Wyatt Clark, who will graduate in May with a bachelor's degree in informatics.

The IPD system is based on a combination of compression algorithms, computing tools that reduce the size of data to save space or speed transmission time.

To begin their study, the team identified two kinds of texts they would analyse: Authentic text (or document) is a collection of several hundreds or thousands of syntactically correct sentences such that the text as a whole is meaningful.

Inauthentic text (or document) is a collection of several hundreds of thousands of syntactically correct sentences that as a whole have no meaning.

The IU researchers' work is documented in their own (very authentic) paper, Using Compression to Identify Classes of Inauthentic Texts, presented at the Society for Industrial and Applied Mathematics Conference on Data Mining, April 20-22, in Bethesda, Md.

The informatics study largely was inspired by a prank pulled by three Massachusetts Institute of Technology students, who in 2004 developed a computer program that churned out randomly generated fake computer science language, essentially a four-page compilation of gibberish.

They submitted it as a research paper to an international conference on computer science and informatics - and it was accepted without review.

Radivojac, whose research expertise is machine learning, says the IPD easily detected numerous inauthentic technical papers tested, including the MIT students' spurious submission.

"We hypothesized we could build a reliable and fast model that recognizes fake papers automatically," says Radivojac.

"We combined these with machine-learning methods to build a predictor of these kinds of papers." In general, identifying meaning in a technical document is difficult, Dalkilic says.

"We don't claim we have found a way to distinguish between meaning and nonsense, but we do emphasize that there are many nontrivial classes of inauthentic documents that can be easily distinguished based on compression algorithms." Costello's and Clark's involvement in the IPD project earned them travel expenses to the Siam Conference, compliments of the Lawrence Livermore National Laboratory in California.

Indiana University School of Informatics: contact details and other news
Email this article to a colleague
Register for the free Laboratorytalk email newsletter
Laboratorytalk Home Page

Search the Pro-Talk network of sites

Visit the Caliper Life Sciences web site