|
In probability theory and statistics, the Zipf–Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf who suggested a simpler distribution called Zipf's law, and the mathematician Benoît Mandelbrot, who subsequently generalized it.
The probability mass function is given by:

where HN,q,s is given by:

which may be thought of as a generalization of a harmonic number. In the limit as N approaches infinity, this becomes the Hurwitz zeta function ?(q,s). For finite N and q = 0 the Zipf–Mandelbrot law becomes Zipf's law. For infinite N and q = 0 it becomes a Zeta distribution.
Applications
The distribution of words ranked by their frequency in a random text corpus is generally a power-law distribution, known as Zipf's law.
If one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Gelbukh and Sidorov 2001).
References and links
| |