user744121
user744121

Reputation: 475

What is the command for Zipf (frequency against rank) plot in R

I have from a network traffic data, data volume (# of bytes) and # of flows over a week period for origin and destination IP pair. I want to plot distribution, i.e. frequency against rank. I believe that there is a function already provided by R for that. What is it and how to use that function for my scenario.

Upvotes: 4

Views: 4104

Answers (5)

kqr
kqr

Reputation: 15028

Some people use the term Zipf plot to mean the log-log plot of the survival function (the inverse of the cumulative probability density). I usually plot it this way:

plot(
  log10(rev(sort(data))),
  log10(seq_along(data)/length(data))
)

Upvotes: 0

Russ
Russ

Reputation: 3771

This should properly be a comment to hadley's answer, but the original question is looking for:

plot(log10(seq_along(tbl)), log10(unclass(tbl)))

Upvotes: 2

user744121
user744121

Reputation: 475

I found out that Zipf plot is just the log-log plot of the frequency of an entity (say 'flows') sorted in descending order.

Upvotes: -2

hadley
hadley

Reputation: 103898

It hardly seems like you need a special function:

x <- rpois(1000, 10)
tbl <- table(x)
plot(seq_along(tbl), unclass(tbl))

Or are you looking for hist?

hist(x)

Upvotes: 2

chl
chl

Reputation: 29347

Check out the zipfR package, and its dedicated website including the following tutorial: The zipfR package for lexical statistics: A tutorial introduction.

Upvotes: 3

Related Questions