Reputation: 6409
I have a dataset with numbers indicating daily difference in some measure.
https://dl.dropbox.com/u/22681355/diff.csv
I would like to create a plot of the distribution of the differences with special emphasis on the rare large changes.
I tried plotting each column using the hist() function but it doesn't really provide a detailed picture of the data.
For example plotting the first column of the dataset produces the following plot:
https://dl.dropbox.com/u/22681355/Rplot.pdf
My problem is that this gives very little detail to the infrequent large deviations.
What is the easiest way to do this?
Also any suggestions on how to summarize this data in a table? For example besides showing the min, max and mean values, would you look at quantiles? Any other ideas?
Upvotes: 1
Views: 1037
Reputation: 132999
Violin plots could be useful:
df <- read.csv('https://dl.dropbox.com/u/22681355/diff.csv')
library(vioplot)
with(df,vioplot(a,b,c,d,e,f,g,h,i,j))
I would use a boxplot on transformed data, e.g.:
boxplot(df[,-1]/sqrt(abs(df[,-1])))
Obviously a histogram would also look better after transformation.
Upvotes: 1
Reputation: 44634
I back @Sven's suggestion for identifying outliers, but you can get more refinement in your histograms by specifying a denser set of breakpoints than what hist
chooses by default.
d <- read.csv('https://dl.dropbox.com/u/22681355/diff.csv', header=TRUE, row.names=1)
with(d, hist(a, breaks=seq(min(a), max(a), length.out=100)))
Upvotes: 2
Reputation: 81753
You could use boxplots to visualize the distribution of the data:
sdiff <- read.csv("https://dl.dropbox.com/u/22681355/diff.csv")
boxplot(sdiff[,-1])
Outliers are printed as circles.
Upvotes: 2