Reputation: 71
I have 2 corpuses, if one has a larger vocabulary size than the other, does it mean its language is more complex?
Apart from complexity of the language, what else can effect the size of the vocabulary in a corpus?
Upvotes: 1
Views: 801
Reputation: 123
Apart from what Oliver has mentioned, from my professional experience the size of the vocabulary in a corpus often depends on the following:
As to your first question of language complexity, every language's complexity is relative to the issue at hand. If we are developing an English-Japanese translator -- the Japanese language is VERY complex, if a Chinese person is learning Japanese, it is MODERATELY complex. If we are comparing inflectional morphology: Russian and German are more complex than English. Basically, there are many ways of looking at the issue of language complexity depending on the participants' perspectives.
Upvotes: 1
Reputation: 2270
No. Language consists of a lot more than just vocabulary. If the grammatical structures are convoluted, then even a smaller vocabulary can lead to very complex sentences.
In order to answer the second part properly, you'd need to define first what exactly you mean by 'complexity'. This is not a measure that can easily be quantified (such as, eg, sentence length).
Most reading comprehension measures combine the length of words and sentences, on the assumption that longer words and longer sentences are harder to understand; however, shorter words tend to have more different meanings, and are arguably harder to understand if their meaning is not clear from the context.
Update after clarification: The size of the vocabulary depends on various factors, such as:
Upvotes: 1