passiflora
passiflora

Reputation: 125

Splitting a dataset into two datasets in R (for ggplot2 channeled through Shiny)

I saw some similar questions here, but none exactly like mine - or if they were the same, I didn't recognize it, as a rank newbie to programming in R (I've programmed in lots of other languages, but not R!)

I have an input dataset from a csv file, that I convert with read.csv. The dataset may or may not, have two groups in it. I found I could split the groups as follows:

datalist <- split(mydata, mydata$group)

but then the list I get back does not play nice with ggplot2 (I get an error that it cannot plot a list variable - although the list variable, if I print it to the console, shows the split data subset?). OK, fine. But if I then do

data = as.data.frame(datalist[1])

And feed that to ggplot2, as.data.frame mangles my column names, and so I lose the name of the variable I want to plot. Augh!

What I ideally want, is to split my input data as read by read.csv, into two separate variables (data frames, I take it?) that ggplot2 can recognize as valid data sets. Actually, I want to overlay them as histograms on the same plot.

There HAS to be an easy way to do this, but I'm not gettin' it? Advice or pointers welcome.

Upvotes: 0

Views: 5745

Answers (2)

Brian Diggs
Brian Diggs

Reputation: 58845

The result of split(mydata, mydata$group) is a list of data.frames. There is a difference in the [ and [[ notation: [ subsets the list where [[ extracts from the list. So datalist[1] is a list of length 1 consisting of just the first data.frame. datalist[[1]] is the data.frame which is in the first position. Since ggplot (and qplot) expects a data.frame, you need the second (double bracket) version as @Alex mentioned in the comment. I don't know why you got the error you saw and can't diagnosis it without a complete example. Using a different data set (mtcars), I don't see it.

datalist <- split(mtcars, mtcars$am)

ggplot(datalist[[1]], aes(x=wt, y=mpg)) + geom_point()

enter image description here

qplot(wt, data=datalist[[1]], colour="cyan")

enter image description here

(I'm guessing you wanted colour=I("cyan"), but that's an unrelated issue.)

The difference in the subsetting/extraction operators can be seen here:

> str(datalist)
List of 2
 $ 0:'data.frame':      19 obs. of  11 variables:
  ..$ mpg : num [1:19] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
  ..$ cyl : num [1:19] 6 8 6 8 4 4 6 6 8 8 ...
  ..$ disp: num [1:19] 258 360 225 360 147 ...
  ..$ hp  : num [1:19] 110 175 105 245 62 95 123 123 180 180 ...
  ..$ drat: num [1:19] 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
  ..$ wt  : num [1:19] 3.21 3.44 3.46 3.57 3.19 ...
  ..$ qsec: num [1:19] 19.4 17 20.2 15.8 20 ...
  ..$ vs  : num [1:19] 1 0 1 0 1 1 1 1 0 0 ...
  ..$ am  : num [1:19] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ gear: num [1:19] 3 3 3 3 4 4 4 4 3 3 ...
  ..$ carb: num [1:19] 1 2 1 4 2 2 4 4 3 3 ...
 $ 1:'data.frame':      13 obs. of  11 variables:
  ..$ mpg : num [1:13] 21 21 22.8 32.4 30.4 33.9 27.3 26 30.4 15.8 ...
  ..$ cyl : num [1:13] 6 6 4 4 4 4 4 4 4 8 ...
  ..$ disp: num [1:13] 160 160 108 78.7 75.7 ...
  ..$ hp  : num [1:13] 110 110 93 66 52 65 66 91 113 264 ...
  ..$ drat: num [1:13] 3.9 3.9 3.85 4.08 4.93 4.22 4.08 4.43 3.77 4.22 ...
  ..$ wt  : num [1:13] 2.62 2.88 2.32 2.2 1.61 ...
  ..$ qsec: num [1:13] 16.5 17 18.6 19.5 18.5 ...
  ..$ vs  : num [1:13] 0 0 1 1 1 1 1 0 1 0 ...
  ..$ am  : num [1:13] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ gear: num [1:13] 4 4 4 4 4 4 4 5 5 5 ...
  ..$ carb: num [1:13] 4 4 1 1 2 1 1 2 2 4 ...
> str(datalist[1])
List of 1
 $ 0:'data.frame':      19 obs. of  11 variables:
  ..$ mpg : num [1:19] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
  ..$ cyl : num [1:19] 6 8 6 8 4 4 6 6 8 8 ...
  ..$ disp: num [1:19] 258 360 225 360 147 ...
  ..$ hp  : num [1:19] 110 175 105 245 62 95 123 123 180 180 ...
  ..$ drat: num [1:19] 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
  ..$ wt  : num [1:19] 3.21 3.44 3.46 3.57 3.19 ...
  ..$ qsec: num [1:19] 19.4 17 20.2 15.8 20 ...
  ..$ vs  : num [1:19] 1 0 1 0 1 1 1 1 0 0 ...
  ..$ am  : num [1:19] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ gear: num [1:19] 3 3 3 3 4 4 4 4 3 3 ...
  ..$ carb: num [1:19] 1 2 1 4 2 2 4 4 3 3 ...
> str(datalist[[1]])
'data.frame':   19 obs. of  11 variables:
 $ mpg : num  21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
 $ cyl : num  6 8 6 8 4 4 6 6 8 8 ...
 $ disp: num  258 360 225 360 147 ...
 $ hp  : num  110 175 105 245 62 95 123 123 180 180 ...
 $ drat: num  3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
 $ wt  : num  3.21 3.44 3.46 3.57 3.19 ...
 $ qsec: num  19.4 17 20.2 15.8 20 ...
 $ vs  : num  1 0 1 0 1 1 1 1 0 0 ...
 $ am  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ gear: num  3 3 3 3 4 4 4 4 3 3 ...
 $ carb: num  1 2 1 4 2 2 4 4 3 3 ...

Upvotes: 1

IRTFM
IRTFM

Reputation: 263411

If you just want a single index value then using subset might be easier (at least for interactive use.)

  p <- qplot(value,     # assuming there is a column named "value"
             data = subset(mydata, group==mydata$group[1]), 
             colour = "cyan")

Upvotes: 2

Related Questions