

A further limitation is the rather small size of the underlying corpus. Although this dictionary has been very useful, it is becoming increasingly outdated, as it is based on publications from the 1940s to the 1970s. The source most frequently used thus far has been the Dictionary of Modern Chinese Frequency ≪ ≫ (1986), which is based on a corpus of 1.8 million characters (or 1.3 million words after segmentation) and provides frequency information for 31,159 words.

Then, we describe the contribution a new frequency measure based on film subtitles is making in other languages and we present a similar database for Mandarin Chinese.Īvailable sources of Chinese word frequenciesĪ first way to find information about Chinese word frequencies is to look them up in published frequency-based dictionaries. In this text, we first describe the frequency measures that are available for Chinese. By far the most important word feature is word frequency. Research on the Chinese language requires reliable information about word characteristics, so that the stimulus materials can be manipulated and controlled properly. Finally, a Chinese character represents a syllable, which most of the time is a morpheme (i.e., the smallest meaningful element), and many Chinese words in fact are disyllabic compound words. This is likely to have consequences for eye movement control in reading. Another characteristic of the Chinese writing system is that there are no spaces between the words. For example, the logographic writing system makes it impossible to compute the word's phonology on the basis of non-lexical letter to sound conversions. Not only is Chinese one of the most widely spoken languages in the world, it also differs in interesting ways from the alphabetic writing systems used in the Western world. Research on the Chinese language is becoming an important theme in psycholinguistics.
