Kai036
Kai036

Reputation: 310

Is there a way to split a data frame in r and then index into the new data frames?

I have a data frame looking like this:

Grade   Class_Dept   Class_Name   Class_Work
9       English      English 1    30
10      History      Modern World 50
11      Science      AP Chem      85
12      Math         Calc BC      45

It extends further than that, but that's the general idea. I would like to split this into multiple smaller data frames by Class_Name. I tried using plyr, but couldn't figure it out. I also tried the split() function, which worked, but did not allow me to index into each sub-dataframe in a for loop. Is there any other way I can do this? Any help would be appreciated.

Also, the split() function would work if I could index into each sub-dataframe. If that doesn't make sense, what I would want to do is get the mean and standard deviation of the Class_Work for each Class_Name and compare them. I could do this manually with the list returned from split(), but it would take a long time, as my dataframe has about 120 different classes. If there's a way to automate this, that would be fantastic.

Upvotes: 1

Views: 264

Answers (4)

Rushabh Patel
Rushabh Patel

Reputation: 2764

You can you data.table package:

> dt <- iris
> setDT(dt)[,.(mean=mean(Petal.Width),std_dv=sd(Sepal.Length)),by=.(Species)]

     Species  mean    std_dv
1:     setosa 0.246 0.3524897
2: versicolor 1.326 0.5161711
3:  virginica 2.026 0.6358796

Upvotes: 0

Dave2e
Dave2e

Reputation: 24079

It seems like the real goal is to collect summary data on your total dataset grouped by "Class_Name" and that it is really unnecessary to split into different data frames. There are several good options to perform this summary with both base R and with the dplyr package.

Below are example using the split/sapply, tapply and the group_by/summarize techniques.

df<-read.table(header=TRUE, text='Grade   Class_Dept   Class_Name   Class_Work
9       English      "English 1"    30
10      History      "Modern World" 50
11      Science      "AP Chem"      85
12      Math         "Calc BC"      45')

#Base R solution
#split into a list of dataframes by Class_name
dflist<-split(df, df$Class_Name)
#perform math operation on each dataframe
workmean<-sapply(dflist, function(x){ mean(x$Class_Work)})
workstdev<-sapply(dflist, function(x){ sd(x$Class_Work)})

workmean
#   AP Chem      Calc BC    English 1 Modern World 
#        85           45           30           50 

#tapply option:
tapply(df$Class_Work, df$Class_Name, mean)
#     AP Chem      Calc BC    English 1 Modern World 
#          85           45           30           50 

#dplyr solution
library(dplyr)
df %>% group_by(Class_Name) %>% summarize(mean=mean(Class_Work), stdev=sd(Class_Work))
# # A tibble: 4 x 3
#   Class_Name    mean stdev
#   <fct>        <dbl> <dbl>
# 1 AP Chem         85   NaN
# 2 Calc BC         45   NaN
# 3 English 1       30   NaN
# 4 Modern World    50   NaN

Upvotes: 0

SmokeyShakers
SmokeyShakers

Reputation: 3412

If you're trying to split and loop, try split and lapply/vapply:

vapply(split(mtcars, mtcars$cyl), function(df) mean(df$mpg), double(1))

Upvotes: 0

Yifu Yan
Yifu Yan

Reputation: 6106

You can use dplyr::group_split()

library(dplyr)
iris %>%
    group_by(Species) %>%
    group_split()

Upvotes: 3

Related Questions