Nina Tahmasebi, Gerhard Gossen, Nattiya Kanhabua, Helge Holzmann and Thomas Risse
NEER: An Unsupervised Method for Named Entity Evolution Recognition
Published in: COLING 2012
Erratum for experiments: filtering using Machine Learning (added 2013-07-03)Abstract
High impact events, political changes and new technologies are reflected in our language and lead to constant evolution of terms, expressions and names. Not knowing about names used in the past for referring to a named entity can severely decrease the performance of many computational linguistic algorithms. We propose NEER, an unsupervised method for named entity evolution recognition independent of external knowledge sources. We find time periods with high likelihood of evolution. By analyzing only these time periods using a sliding window co-occurrence method we capture evolving terms in the same context. We thus avoid comparing terms from widely different periods in time and overcome a severe limitation of existing methods for named entity evolution, as shown by the high recall of 90% on the New York Times corpus. We compare several relatedness measures for filtering to improve precision. Furthermore, using machine learning with minimal supervision improves precision to 94%.
German abstract /Zusammenfassung in Deutsch
Wichtige Ereignisse, politische Veränderungen und neue Technologien spiegeln sich in unserer Sprache wieder und führen zu einer ständigen Evolution von Begriffen, Ausdrücken und Namen. Mangelndes Wissen über frühere Namen einer Entität kann die Leistungsfähigkeit vieler computerlinguistischer Methoden deutlich verringern. In diesem Papier präsentieren wir unsere nichtüberwachte Methode namens NEER zur Erkennung von Namensevolution, die unabhängig von externen Datenquellen arbeitet. Indem wir Zeiträume mit erhöhter Evolutionswahrscheinlichkeit mit Hilfe einer Kookkurrenzmethode basierend auf Sliding Windows-Verfahren untersuchen, erfassen wir evolvierende Terme im selben Kontext. Dadurch vermeiden wir es, Terme aus weit auseinander liegenden Zeiträumen zu vergleichen und umgehen damit eine schwerwiegende Beschränkung vorhandener Methoden. Dieses zeigt sich an einer gemessenen Sensitivität von 90% auf dem Korpus der New York Times. Um die Genauigkeit zu erhöhen, vergleichen wir mehrere Ähnlichkeitsmaße zur Filterung. Mit Hilfe von maschinellem Lernen mit minimaler Überwachung verbessern wir die Genauigkeit auf 94%.
BibTeX
@INPROCEEDINGS{neer2012, title = “NEER: An Unsupervised Method for Named Entity Evolution Recognition”, author = {Nina Tahmasebi and Gerhard Gossen and Nattiya Kanhabua and Helge Holzmann and Thomas Risse}, year = 2012, month = dec, editor = {Martin Kay and Christian Boitet}, booktitle = {Proceedings of the 24th International Conference on Computational Linguistics (Coling 2012)}, pages = {2553-2568}, address = {Mumbai, India}, publisher = {Indian Institute of Technology Bombay}, url = {http://tahmasebi.se/project/neer/} }
Resources
Please cite this paper if you use the resources provided below.
Test set: Entities with name changes and change periods. Version 1, 2012-12-08, sha1, README.
A collection of named entity pairs where each named entity has changed over time. E.g., (Pope Benedict XVI , Cardinal Joseph Ratzinger). All terms exist in the New York Times Annotated Corpus with at least 5 occurrences during the year when the name change occurred. Also a file with term year pairs where the year corresponds to the year when the name experienced a change.
Extended test set. Version 1, 2012-12-08, sha1, README.
Extended version of the test set given above. This set has some named entity pairs that were excluded from the experiments because the change occured outside of the time span of the corpus or because the names occur just a few times in the text.
Source code (added 2013-07-03)
The source code of NEER is now publically available.
t historical text as well as researchers that wish to track concepts over time without manually finding and accounting for language change.