Padmaja Ganesh
Padmaja Ganesh

Reputation: 101

Split a data frame into sub data frames depending on values in a column

I have a data frame with 300 columns, I want to split the data frame depending on values in a column Millage (MPG)

                      Model              MPG     Origin
1              chevrolet chevelle malibu 18.0     US
2                      buick skylark 320 15.0     US
3                     plymouth satellite 18.0     US
4                          amc rebel sst 16.0     US
5                            ford torino 17.0     US
6                       ford galaxie 500 15.0     US
7                       chevrolet impala 14.0     US
8                      plymouth fury iii 14.0     US
9                       pontiac catalina 14.0     US
10                    amc ambassador dpl 15.0     US
11                   dodge challenger se 15.0     US

I want to split the data frame such that.

I have a data frame with MPG's less than 14 , 14-17 , greater than 17.

y is my parent data set i want to split it into low, medium and high datasets with the values specified above.

I was trying to us for loop to append the values less than 13.6 and then insert the matrix into a separate data frame named low.

for(i in 1:nrow(y)){
  if(y[i,2] <13.6){
    low_arrayMPG.append(y[i,2])
    low_arrayModel.append(y[i,1])
    low_arrayOrigin.append(y[i,3])

  }

}

Could anyone help me if the approach is right or is there any function in R which i can use for this exact purpose which will make it easier to split the data frames into desired sub data frames ?

Upvotes: 0

Views: 994

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47350

Maybe you'll also like these:

split(df1,(df1$MPG>=14)+(df1$MPG>17))
# $`1`
# Model MPG Origin
# 2          buick skylark 320  15     US
# 4              amc rebel sst  16     US
# 5                ford torino  17     US
# 6           ford galaxie 500  15     US
# 7           chevrolet impala  14     US
# 8          plymouth fury iii  14     US
# 9           pontiac catalina  14     US
# 10        amc ambassador dpl  15     US
# 11       dodge challenger se  15     US
# 
# $`2`
# Model MPG Origin
# 1 chevrolet chevelle malibu  18     US
# 3        plymouth satellite  18     US


library(dplyr)
library(tidyr)
df1 %>% group_by(spl = (MPG>=14) + (MPG>17)) %>% nest
# # A tibble: 2 x 2
#       spl             data
#     <int>           <list>
#   1     2 <tibble [2 x 3]>
#   2     1 <tibble [9 x 3]>

data

df1 <- read.table(text="                      Model              MPG     Origin
           1              'chevrolet chevelle malibu' 18.0     US
                  2              '        buick skylark 320' 15.0     US
                  3              '       plymouth satellite' 18.0     US
                  4              '            amc rebel sst' 16.0     US
                  5              '              ford torino' 17.0     US
                  6              '         ford galaxie 500' 15.0     US
                  7              '         chevrolet impala' 14.0     US
                  8              '        plymouth fury iii' 14.0     US
                  9              '         pontiac catalina' 14.0     US
                  10             '       amc ambassador dpl' 15.0     US
                  11             '      dodge challenger se' 15.0     US",header=T,stringsAsFactors=F)

Upvotes: 0

akrun
akrun

Reputation: 887851

We could use findInterval to create a grouping variable for splitting the dataset into a list of data.frames

lst <- split(df1, findInterval(df1$MPG, c(14, 17), rightmost.closed = TRUE))

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522636

I think you can just subset your data frame (df) as follows:

df_low    <- df[df$MPR < 14, ]
df_medium <- df[df$MPR >= 14 & df$MPR <= 17, ]
df_high   <- df[df$MPR > 17, ]

Upvotes: 3

Related Questions