Authorship Analysis and Cross-Language Grammar Features

Intrinsic Plagiarism Detection and Authorship Analysis

Capturing the essence of the writing style of authors is an important research area in natural language processing. It allows to identify and attribute the author of a previously unseen document, perform so-called style change detection (find the positions at which the author changes within a document), detect plagiarism intrinsically, develop new technology for writing support, or perform forensic analyses.

To date, detecting variations in the writing style belongs to the most difficult and most interesting challenges in authorship analyses. The task of authorship attribution is particularly challenging in scenarios where ground truth textual data is only available in different languages (for instance, for bilingual authors). Moreover, style change detection is the only means to detect plagiarism in a document if no comparison texts are available.

In our research, we focus on utilizing grammar features for several of the above-mentioned tasks. Thereby, we have pioneered work in cross-language scenarios, where authors have written documents in multiple languages. Current research in this field also covers the detection of social media bots, which have become a more pressing matter in recent years. 

At DBIS, we are part of PAN, an international group of scientists focusing on the writing styles and habits of authors. The PAN initiative organizes shared tasks, where many researchers from across the world compete against each other in finding the best strategies to tackle problems in Authorship Attribution, Author Profiling as well as Multi-Author-Decomposition. Particularly, we are co-organizers of the Style Change Detection task at PAN.

 

Team

Publications

2014

Bib Link

Michael Tschuggnall and Günther Specht: Automatic Decomposition of Multi-Author Documents Using Grammar Analysis. In Proceedings of the 26th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), (GvDB 2014), October 2014, Ritten, Italy. CEUR-WS.org, Volume 1313, pages 17-22, 2014

Bib Link

Michael Tschuggnall, Günther Specht: What Grammar Tells About Gender and Age of Authors. In Proceedings of the 4th International Conference on Advances in Information Mining and Management (IMMM 2014), July 2014, Paris, France, pp. 30-35, 2014

Bib Link

Michael Tschuggnall and Günther Specht: Enhancing Authorship Attribution By Utilizing Syntax Tree Profiles. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), volume 2: Short Papers, April 2014, ACL, Gothenburg, Sweden, pages 195-199, 2014.

2013

Bib Link

Michael Tschuggnall and Günther Specht: Countering Plagiarism by Exposing Irregularities in Authors Grammars. In Proceedings of the European Intelligence and Security Informatics Conference (EISIC 2013), 12.-14. August 2013, Uppsala, Sweden, IEEE, pages 15-22, 2013

Bib Link

Michael Tschuggnall and Günther Specht: Using Grammar-Profiles to Intrinsically Expose Plagiarism in Text Documents. In Proceedings of the 18th International Conference of Natural Language Processing and Information Systems (NLDB 2013), Manchester, UK, June 2013, Springer, LNCS Volume 7934, pages 297-302, 2013

Bib Link Download

Michael Tschuggnall and Günther Specht. Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors. In Proceedings of the 15. GI-Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW 2013), 11.-15. März 2013, Magdeburg, LNI, pages 241-259, 2013

Bib Link

Michael Tschuggnall and Günther Specht: Plag-Inn: Uncovering Plagiarism by Examining Author’s Grammar Syntax. In M. Barden, Alexander Ostermann (ed): Scientific Computing @ uibk, innsbruck university press, pages 151-152, 2013

2012

Bib Link

Michael Tschuggnall and Günther Specht. Plag-Inn: Intrinsic Plagiarism Detection Using Grammar Trees. In Proceedings of the 17th International Conference of Natural Language Processing and Information Systems (NLDB 2012), Groningen, The Netherlands, June 2012, Springer, LNCS Volume 7337, pages 284-289, 2012