only search NEJLT

Volume 3, Article 1, 2013

Stagger: an Open-Source Part of Speech Tagger for Swedish

Author: Robert Östling
Affiliation: Stockholm University, Department of Linguistics
DOI: 10.3384/nejlt.2000-1533.1331
Volume: 3
Article No.: 1
Available: 2013-09-16
View Article: Pdf fileView Article (PDF); References (HTML)
No. of pages: 18
Pages: 1-18
Abstract: This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.

Publishing host : Linköping University Electronic Press, Linköpings universitet