Everything is pointing towards success in unravelling the mysteries inherent in every human language, which for nearly 100 years have been an object of intrigue for mathematicians and linguists working on studies into statistics of literature. New analysis of the frequencies of word occurrence in the most famous works of literature, undertaken at the Institute of Nuclear Physics of the Polish Academy of Sciences in Kraków, have shown that our languages are structurally more complex and more exhaustive than they ever before seemed.

It's been said that 80% of a person's success is achieved from only 20% of their efforts. That famous ratio holds up over a surprising number of domains. For example, it is apparent that in every language, whether spoken or written, that 80% of all statements are made up of merely 20% of the most common words. One possible reason is that when we talk to each other we want to convey as much content as possible with the least effort (among other factors). This phenomenon of dependency was one of the earliest of the series of power laws to be discovered, and is known as Zipf's law. It has turned out that it is not as trivial as it might seem at first glance. Scientists from the Institute of Nuclear Physics of the Polish Academy of Sciences (IFJ PAN) in Kraków have established that certain puzzling features of Zipf's law, for decades a source of intrigue for those involved in the statistical analysis of literary texts, are a consequence of neglecting one of the basic components of language.

Recent research conducted at the Institute of Nuclear Physics of the Polish Academy of Sciences in Kraków reveals that in narrative texts punctuation plays as important role as words. (Source: IFJ PAN)

Probability of occurrence of words (vertical axis) versus their rank (horizontal axis) for corpora representing different European languages. The original puzzling downward departure from the straight line for ranks close to unity, observed for the ordinary words (brighter colors), disappears (corresponding darker colors) when the punctuation marks are also taken into account. (Source: IFJ PAN)

