Publications

2022

Victor Zimmermann and Maja Hoffmann. Absinth: A small world approach to word sense induction. Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022). September 2022, Potsdam, Germany.
abstract .pdf .bib
Absinth provides a novel unsupervised graph-based approach to word sense induction. This work combines small world coöccurrence networks with a graph propagation algorithm to induce per-word sense assignment vectors over a lexicon that can be aggregated for classification of whole snippets.

Ines Rehbein, Josef Ruppenhofer and Victor Zimmermann. A harmonised testsuite for POS tagging of German social media data. Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018). September 2018, Vienna, Austria.
abstract .pdf .bib
We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data, we harmonised the annotations in already existing web corpora, based on the Stuttgart-Tübingen Tag Set. The current version of the corpus has an overall size of 48,344 tokens of web data, around half of it from Twitter. We also present experiments, showing how different experimental setups (training set size, additional out-of-domain training data, self-training) influence the accuracy of the taggers. All resources and models will be made publicly available to the research community.