Reputation: 310
I have a data frame looking like this:
Grade Class_Dept Class_Name Class_Work
9 English English 1 30
10 History Modern World 50
11 Science AP Chem 85
12 Math Calc BC 45
It extends further than that, but that's the general idea. I would like to split this into multiple smaller data frames by Class_Name. I tried using plyr, but couldn't figure it out. I also tried the split() function, which worked, but did not allow me to index into each sub-dataframe in a for loop. Is there any other way I can do this? Any help would be appreciated.
Also, the split() function would work if I could index into each sub-dataframe. If that doesn't make sense, what I would want to do is get the mean and standard deviation of the Class_Work for each Class_Name and compare them. I could do this manually with the list returned from split(), but it would take a long time, as my dataframe has about 120 different classes. If there's a way to automate this, that would be fantastic.
Upvotes: 1
Views: 264
Reputation: 2764
You can you data.table
package:
> dt <- iris
> setDT(dt)[,.(mean=mean(Petal.Width),std_dv=sd(Sepal.Length)),by=.(Species)]
Species mean std_dv
1: setosa 0.246 0.3524897
2: versicolor 1.326 0.5161711
3: virginica 2.026 0.6358796
Upvotes: 0
Reputation: 24079
It seems like the real goal is to collect summary data on your total dataset grouped by "Class_Name" and that it is really unnecessary to split into different data frames. There are several good options to perform this summary with both base R and with the dplyr package.
Below are example using the split/sapply
, tapply
and the group_by/summarize
techniques.
df<-read.table(header=TRUE, text='Grade Class_Dept Class_Name Class_Work
9 English "English 1" 30
10 History "Modern World" 50
11 Science "AP Chem" 85
12 Math "Calc BC" 45')
#Base R solution
#split into a list of dataframes by Class_name
dflist<-split(df, df$Class_Name)
#perform math operation on each dataframe
workmean<-sapply(dflist, function(x){ mean(x$Class_Work)})
workstdev<-sapply(dflist, function(x){ sd(x$Class_Work)})
workmean
# AP Chem Calc BC English 1 Modern World
# 85 45 30 50
#tapply option:
tapply(df$Class_Work, df$Class_Name, mean)
# AP Chem Calc BC English 1 Modern World
# 85 45 30 50
#dplyr solution
library(dplyr)
df %>% group_by(Class_Name) %>% summarize(mean=mean(Class_Work), stdev=sd(Class_Work))
# # A tibble: 4 x 3
# Class_Name mean stdev
# <fct> <dbl> <dbl>
# 1 AP Chem 85 NaN
# 2 Calc BC 45 NaN
# 3 English 1 30 NaN
# 4 Modern World 50 NaN
Upvotes: 0
Reputation: 3412
If you're trying to split and loop, try split and lapply/vapply:
vapply(split(mtcars, mtcars$cyl), function(df) mean(df$mpg), double(1))
Upvotes: 0
Reputation: 6106
You can use dplyr::group_split()
library(dplyr)
iris %>%
group_by(Species) %>%
group_split()
Upvotes: 3