Reputation: 125
I saw some similar questions here, but none exactly like mine - or if they were the same, I didn't recognize it, as a rank newbie to programming in R (I've programmed in lots of other languages, but not R!)
I have an input dataset from a csv file, that I convert with read.csv. The dataset may or may not, have two groups in it. I found I could split the groups as follows:
datalist <- split(mydata, mydata$group)
but then the list I get back does not play nice with ggplot2 (I get an error that it cannot plot a list variable - although the list variable, if I print it to the console, shows the split data subset?). OK, fine. But if I then do
data = as.data.frame(datalist[1])
And feed that to ggplot2, as.data.frame mangles my column names, and so I lose the name of the variable I want to plot. Augh!
What I ideally want, is to split my input data as read by read.csv, into two separate variables (data frames, I take it?) that ggplot2 can recognize as valid data sets. Actually, I want to overlay them as histograms on the same plot.
There HAS to be an easy way to do this, but I'm not gettin' it? Advice or pointers welcome.
Upvotes: 0
Views: 5745
Reputation: 58845
The result of split(mydata, mydata$group)
is a list
of data.frame
s. There is a difference in the [
and [[
notation: [
subsets the list where [[
extracts from the list. So datalist[1]
is a list of length 1 consisting of just the first data.frame
. datalist[[1]]
is the data.frame
which is in the first position. Since ggplot
(and qplot
) expects a data.frame
, you need the second (double bracket) version as @Alex mentioned in the comment. I don't know why you got the error you saw and can't diagnosis it without a complete example. Using a different data set (mtcars
), I don't see it.
datalist <- split(mtcars, mtcars$am)
ggplot(datalist[[1]], aes(x=wt, y=mpg)) + geom_point()
qplot(wt, data=datalist[[1]], colour="cyan")
(I'm guessing you wanted colour=I("cyan")
, but that's an unrelated issue.)
The difference in the subsetting/extraction operators can be seen here:
> str(datalist)
List of 2
$ 0:'data.frame': 19 obs. of 11 variables:
..$ mpg : num [1:19] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
..$ cyl : num [1:19] 6 8 6 8 4 4 6 6 8 8 ...
..$ disp: num [1:19] 258 360 225 360 147 ...
..$ hp : num [1:19] 110 175 105 245 62 95 123 123 180 180 ...
..$ drat: num [1:19] 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
..$ wt : num [1:19] 3.21 3.44 3.46 3.57 3.19 ...
..$ qsec: num [1:19] 19.4 17 20.2 15.8 20 ...
..$ vs : num [1:19] 1 0 1 0 1 1 1 1 0 0 ...
..$ am : num [1:19] 0 0 0 0 0 0 0 0 0 0 ...
..$ gear: num [1:19] 3 3 3 3 4 4 4 4 3 3 ...
..$ carb: num [1:19] 1 2 1 4 2 2 4 4 3 3 ...
$ 1:'data.frame': 13 obs. of 11 variables:
..$ mpg : num [1:13] 21 21 22.8 32.4 30.4 33.9 27.3 26 30.4 15.8 ...
..$ cyl : num [1:13] 6 6 4 4 4 4 4 4 4 8 ...
..$ disp: num [1:13] 160 160 108 78.7 75.7 ...
..$ hp : num [1:13] 110 110 93 66 52 65 66 91 113 264 ...
..$ drat: num [1:13] 3.9 3.9 3.85 4.08 4.93 4.22 4.08 4.43 3.77 4.22 ...
..$ wt : num [1:13] 2.62 2.88 2.32 2.2 1.61 ...
..$ qsec: num [1:13] 16.5 17 18.6 19.5 18.5 ...
..$ vs : num [1:13] 0 0 1 1 1 1 1 0 1 0 ...
..$ am : num [1:13] 1 1 1 1 1 1 1 1 1 1 ...
..$ gear: num [1:13] 4 4 4 4 4 4 4 5 5 5 ...
..$ carb: num [1:13] 4 4 1 1 2 1 1 2 2 4 ...
> str(datalist[1])
List of 1
$ 0:'data.frame': 19 obs. of 11 variables:
..$ mpg : num [1:19] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
..$ cyl : num [1:19] 6 8 6 8 4 4 6 6 8 8 ...
..$ disp: num [1:19] 258 360 225 360 147 ...
..$ hp : num [1:19] 110 175 105 245 62 95 123 123 180 180 ...
..$ drat: num [1:19] 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
..$ wt : num [1:19] 3.21 3.44 3.46 3.57 3.19 ...
..$ qsec: num [1:19] 19.4 17 20.2 15.8 20 ...
..$ vs : num [1:19] 1 0 1 0 1 1 1 1 0 0 ...
..$ am : num [1:19] 0 0 0 0 0 0 0 0 0 0 ...
..$ gear: num [1:19] 3 3 3 3 4 4 4 4 3 3 ...
..$ carb: num [1:19] 1 2 1 4 2 2 4 4 3 3 ...
> str(datalist[[1]])
'data.frame': 19 obs. of 11 variables:
$ mpg : num 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
$ cyl : num 6 8 6 8 4 4 6 6 8 8 ...
$ disp: num 258 360 225 360 147 ...
$ hp : num 110 175 105 245 62 95 123 123 180 180 ...
$ drat: num 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
$ wt : num 3.21 3.44 3.46 3.57 3.19 ...
$ qsec: num 19.4 17 20.2 15.8 20 ...
$ vs : num 1 0 1 0 1 1 1 1 0 0 ...
$ am : num 0 0 0 0 0 0 0 0 0 0 ...
$ gear: num 3 3 3 3 4 4 4 4 3 3 ...
$ carb: num 1 2 1 4 2 2 4 4 3 3 ...
Upvotes: 1
Reputation: 263411
If you just want a single index value then using subset might be easier (at least for interactive use.)
p <- qplot(value, # assuming there is a column named "value"
data = subset(mydata, group==mydata$group[1]),
colour = "cyan")
Upvotes: 2