The Secrete of Long-Living Words: Predicting the lexical age of neologism with big data

Shu-Kai Hsieh

In recent decades, massive and diverse textual data are increasingly available on the web, and in particular, the heavy use of social media has greatly prompted linguistic creativity and language change in a rapid way. The phenomenon of emerging neologisms has posed a challenge for lexicographical decisions regarding when these newly coined words should be considered as institutionalized and deserve a place in the normative dictionary.

Corpus-based computational lexicography has made progress as an aid for lexicographers in extracting novel words and senses as well as automatically parsing their morphological patterns in the recent years, however, authenticating the vocabulary still remains one of the lexicographer’s tedious works in practice. In this talk, I’ll present our recent works on the understanding of the mechanism underlying the life-cycle of a word, quantitatively exploring the reasons why some neologisms sustained themselves while others are like flash in the pan. I will further demonstrate that in the language ecosystem, words cooperating as well as competing for the attention of the language users, which could be best explained in terms of their social behaviours across time. Apart from the frequency factors as heavily relied upon by previous researches, I will introduce our proposeddistributional social network model embedded with a synthesis of lexical-ontological analysis based on sheer volume of diachronic corpus data in predicting the lexical age. We believe that the proposed estimates could serve as a toolkit to map the new contours of lexicography.