Method to test for phony technical papers
Authors of bogus technical articles beware: A team of researchers at the Indiana University School of Informatics is designing a tool to distinguish between fake and real papers
It's called the Inauthentic Paper Detector - one of the first of its kind anywhere - and it uses compression to determine whether technical texts are generated by machine whose intent is to deceive, or by humans.
"This is a potential problem since no existing systems, the web for example, can or do discriminate between content that is meaningful or bogus," says assistant professor Mehmet Dalkilic, a data mining expert.
"We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning.
Joining Dalkilic on the IPD project are assistant professor Predrag Radivojac; informatics doctoral student James Costello; and Wyatt Clark, who will graduate in May with a bachelor's degree in informatics.
The IPD system is based on a combination of compression algorithms, computing tools that reduce the size of data to save space or speed transmission time.
To begin their study, the team identified two kinds of texts they would analyse: Authentic text (or document) is a collection of several hundreds or thousands of syntactically correct sentences such that the text as a whole is meaningful.
Inauthentic text (or document) is a collection of several hundreds of thousands of syntactically correct sentences that as a whole have no meaning.
The IU researchers' work is documented in their own (very authentic) paper, Using Compression to Identify Classes of Inauthentic Texts, presented at the Society for Industrial and Applied Mathematics Conference on Data Mining, April 20-22, in Bethesda, Md.
The informatics study largely was inspired by a prank pulled by three Massachusetts Institute of Technology students, who in 2004 developed a computer program that churned out randomly generated fake computer science language, essentially a four-page compilation of gibberish.
They submitted it as a research paper to an international conference on computer science and informatics - and it was accepted without review.
Radivojac, whose research expertise is machine learning, says the IPD easily detected numerous inauthentic technical papers tested, including the MIT students' spurious submission.
"We hypothesized we could build a reliable and fast model that recognizes fake papers automatically," says Radivojac.
"We combined these with machine-learning methods to build a predictor of these kinds of papers.
In general, identifying meaning in a technical document is difficult, Dalkilic says.
"We don't claim we have found a way to distinguish between meaning and nonsense, but we do emphasize that there are many nontrivial classes of inauthentic documents that can be easily distinguished based on compression algorithms.
Costello's and Clark's involvement in the IPD project earned them travel expenses to the Siam Conference, compliments of the Lawrence Livermore National Laboratory in California.
Not what you're looking for? Search the site.
Browse by category
- Analytical instruments (9656)
- Life sciences and clinical laboratory equipment (3958)
- Computer hardware (847)
- Industry news (20)
- Laboratory equipment and wares (6466)
- Laboratory supplies and consumables (1374)
- Pumps, Valves, Filters (757)
- Services (3236)
- Contract research (953)
- Laboratory and scientific exhibitions and conferences (262)
- Laboratory and scientific recruitment (27)
- Laboratory and scientific consultancy services (328)
- Laboratory and scientific training and education (432)
- Laboratory building and design (57)
- Laboratory and scientific manufacturing services (145)
- Laboratory and scientific professional organisations (550)
- Laboratory and scientific books and publishing (481)
- Computer software (3046)

