user1723765
user1723765

Reputation: 6409

Plotting distribution of differences in R

I have a dataset with numbers indicating daily difference in some measure.

https://dl.dropbox.com/u/22681355/diff.csv

I would like to create a plot of the distribution of the differences with special emphasis on the rare large changes.

I tried plotting each column using the hist() function but it doesn't really provide a detailed picture of the data.

For example plotting the first column of the dataset produces the following plot:

https://dl.dropbox.com/u/22681355/Rplot.pdf

My problem is that this gives very little detail to the infrequent large deviations.

What is the easiest way to do this?

Also any suggestions on how to summarize this data in a table? For example besides showing the min, max and mean values, would you look at quantiles? Any other ideas?

Upvotes: 1

Views: 1037

Answers (3)

Roland
Roland

Reputation: 132999

Violin plots could be useful:

df <- read.csv('https://dl.dropbox.com/u/22681355/diff.csv')
library(vioplot)
with(df,vioplot(a,b,c,d,e,f,g,h,i,j))

violin plots

I would use a boxplot on transformed data, e.g.:

boxplot(df[,-1]/sqrt(abs(df[,-1])))

boxplot (data transformed)

Obviously a histogram would also look better after transformation.

Upvotes: 1

Matthew Plourde
Matthew Plourde

Reputation: 44634

I back @Sven's suggestion for identifying outliers, but you can get more refinement in your histograms by specifying a denser set of breakpoints than what hist chooses by default.

d <- read.csv('https://dl.dropbox.com/u/22681355/diff.csv', header=TRUE, row.names=1)
with(d, hist(a, breaks=seq(min(a), max(a), length.out=100)))

enter image description here

Upvotes: 2

Sven Hohenstein
Sven Hohenstein

Reputation: 81753

You could use boxplots to visualize the distribution of the data:

sdiff <- read.csv("https://dl.dropbox.com/u/22681355/diff.csv")

boxplot(sdiff[,-1])

Outliers are printed as circles.

enter image description here

Upvotes: 2

Related Questions