Authorship Analysis and Cross-Language Grammar Features

Intrinsic Plagiarism Detection and Authorship Analysis

Capturing the essence of the writing style of authors is an important research area in natural language processing. It allows to identify and attribute the author of a previously unseen document, perform so-called style change detection (find the positions at which the author changes within a document), detect plagiarism intrinsically, develop new technology for writing support, or perform forensic analyses.

To date, detecting variations in the writing style belongs to the most difficult and most interesting challenges in authorship analyses. The task of authorship attribution is particularly challenging in scenarios where ground truth textual data is only available in different languages (for instance, for bilingual authors). Moreover, style change detection is the only means to detect plagiarism in a document if no comparison texts are available.

In our research, we focus on utilizing grammar features for several of the above-mentioned tasks. Thereby, we have pioneered work in cross-language scenarios, where authors have written documents in multiple languages. Current research in this field also covers the detection of social media bots, which have become a more pressing matter in recent years. 

At DBIS, we are part of PAN, an international group of scientists focusing on the writing styles and habits of authors. The PAN initiative organizes shared tasks, where many researchers from across the world compete against each other in finding the best strategies to tackle problems in Authorship Attribution, Author Profiling as well as Multi-Author-Decomposition. Particularly, we are co-organizers of the Style Change Detection task at PAN.

 

Team

Publications

2019

Bib Link

Michael Tschuggnall, Benjamin Murauer and Günther Specht: Reduce & Attribute: Two-Step Authorship Attribution for Large-Scale Problems. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 951-960. Association for Computational Linguistics, 2019

Bib Link Download

Walter Daelemans, Mike Kestemont, Enrique Manjavacas, Martin Potthast, Francisco M. Rangel Pardo, Paolo Rosso, Günther Specht, Efstathios Stamatatos, Benno Stein, Michael Tschuggnall, Matti Wiegmann and Eva Zangerle: Overview of PAN 2019: Bots and Gender Profiling, Celebrity Profiling, Cross-Domain Authorship Attribution and Style Change Detection. In Experimental IR Meets Multilinguality, Multimodality, and Interaction - 10th International Conference of the CLEF Association, CLEF 2019, Lugano, Switzerland, September 9-12, 2019, Proceedings, vol. 11696, pages 402-416. Springer, 2019

Bib Link Download

Eva Zangerle, Michael Tschuggnall, Günther Specht, Martin Potthast and Benno Stein: Overview of the Style Change Detection Task at PAN 2019. In CLEF 2019 Labs and Workshops, Notebook Papers. CEUR-WS.org, 2019

Bib Link

Benjamin Murauer and Günther Specht: Generating Cross-Domain Text Classification Corpora from Social Media Comments. In 20th Conference and Labs of the Evaluation Forum (CLEF'2019), pages 114-125. Springer International Publishing, 2019

Bib Link

Michael Tschuggnall, Thibault Gerrier and Günther Specht: StyleExplorer: A Toolkit for Textual Writing Style Visualization. In Proceedings of the 41th European Conference on Information Retrieval (ECIR 2019): Advances in Information Retrieval, pages 220-224. Springer International Publishing, 2019

2018

Bib Link

Benjamin Murauer, Michael Tschuggnall and Günther Specht: Dynamic Parameter Search for Cross-Domain Authorship Attribution. In Working Notes of CLEF. 2018

Bib Link

M Kestemont, M Tschuggnall, E Stamatatos, W Daelemans, G Specht, B Stein and M Potthast: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In Working Notes Papers of the CLEF. 2018

Bib Link

Efstathios Stamatatos, Francisco Rangel, Michael Tschuggnall, Mike Kestemont, Paolo Rosso, Benno Stein and Martin Potthast: Overview of PAN-2018: Author Identification, Author Profiling, and Author Obfuscation. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. 9th International Conference of the CLEF Initiative (CLEF 18). Springer, Berlin Heidelberg New York (Sep 2018). 2018

Bib Link Download

Eva Zangerle, Michael Tschuggnall, Stefan Wurzinger and Günther Specht: ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists. In Advances in Information Retrieval - 39th European Conference on IR Research (ECIR 2018), pages 584-590. Springer, 2018

Bib Link

Benjamin Murauer, Michael Tschuggnall and Günther Specht: On the Influence of Machine Translation on Language Origin Obfuscation. In Proceedings of the 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2018). 2018 

2017

Bib Link

Michael Tschuggnall: Automatisierte Plagiatserkennung in Textdokumenten: Was der Schreibstil eines Autors über die Echtheit verrät. In S. Mauler, H. Ortner, U. Pfeiffenberger (Edt): Medien und Glaubwürdigkeit, pages 131-140, Innsbruck University Press, 2017

Bib Link

Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso and Benno Stein: Overview of PAN’17: Author Identification, Author Profiling, and Author Obfuscation. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Initiative (CLEF 17). Springer, Berlin Heidelberg New York (Sep 2017). 2017

Bib Link

Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast: Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering. In CEUR Workshop Proceedings, CLEF 2017 Working Notes, Dublin, Ireland, September 11-14, 2017.

2016

Bib Link

Efstathios Stamatatos, Michael Tschuggnall, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast: Clustering by Authorship Within and Across Documents. In Working Notes Papers of the CLEF 2016 Evaluation Labs, CEUR Workshop Proceedings, September 2016. Pages 691-715. CLEF and CEUR-WS.org. ISSN 1613-0073.

Bib Link

Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall and Benno Stein: Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-genre Profiling, Clustering, Diarization, and Obfuscation. In Norbert Fuhr et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 7th International Conference of the CLEF Initiative (CLEF 16), Berlin Heidelberg New York, September 2016. Springer. ISBN 978-3-319-44564-9.

Bib Link Download

Michael Tschuggnall, Günther Specht and Christian Riepl: Algorithmisch unterstützte Literarkritik: Eine grammatikalische Analyse zur Bestimmung von Schreibstilen. In In Memoriam Wolfgang Richter, Hrsg.: H. Rechenmacher, pages 415-428. EOS-Verlag, 2016.

Bib Link

Michael Tschuggnall and Günther Specht: From Plagiarism Detection to Bible Analysis: The Potential of Machine Learning for Grammar-Based Text Analysis. In Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2016), pages 245-248. 2016

2015

Bib Link

Michael Tschuggnall and Günther Specht: On the Potential of Grammar Features for Automated Author Profiling. In International Journal On Advances in Intelligent Systems, Volume 8, Number. 3&4, pages 255-265, 2015.

Bib Link

Michael Tschuggnall: Intrinsische Plagiatserkennung und Autorenerkennung mittels Grammatikanalyse. In Ausgezeichnete Informatikdissertationen 2014, Volume D-15, pages 279-288. Bonner Köllen Druck+Verlag, 2015.

2014

Bib Download

Michael Tschuggnall: Intrinsic Plagiarism Detection and Author Analysis By Utilizing Grammar. PhD thesis, University of Innsbruck, Department of Computer Science, 2014.