Reputation: 475
I have from a network traffic data, data volume (# of bytes) and # of flows over a week period for origin and destination IP pair. I want to plot distribution, i.e. frequency against rank. I believe that there is a function already provided by R for that. What is it and how to use that function for my scenario.
Upvotes: 4
Views: 4104
Reputation: 15028
Some people use the term Zipf plot to mean the log-log plot of the survival function (the inverse of the cumulative probability density). I usually plot it this way:
plot(
log10(rev(sort(data))),
log10(seq_along(data)/length(data))
)
Upvotes: 0
Reputation: 3771
This should properly be a comment to hadley's answer, but the original question is looking for:
plot(log10(seq_along(tbl)), log10(unclass(tbl)))
Upvotes: 2
Reputation: 475
I found out that Zipf plot is just the log-log plot of the frequency of an entity (say 'flows') sorted in descending order.
Upvotes: -2
Reputation: 103898
It hardly seems like you need a special function:
x <- rpois(1000, 10)
tbl <- table(x)
plot(seq_along(tbl), unclass(tbl))
Or are you looking for hist
?
hist(x)
Upvotes: 2
Reputation: 29347
Check out the zipfR package, and its dedicated website including the following tutorial: The zipfR package for lexical statistics: A tutorial introduction.
Upvotes: 3