Separating data frame based on column values

Question

I am having a bit of trouble with trying to script a code in R so that it separates a data frame based on the character in a data frame column without manually specifying a subset command. Below is the script for reproduction in R:

    a=c("Model_A","R1",358723.0,171704.0,1.0,36.818500,4.0222700,1.38895000)
    b=c("Model_A","R2",358723.0,171704.0,2.6,36.447300,4.0116100,1.37479000)
    c=c("Model_A","R3",358723.0,171704.0,5.0,35.615400,3.8092600,1.34301000)
    d=c("Model_B","R1",358723.0,171704.0,1.0,39.818300,2.4475600,1.50384000)
    e=c("Model_B","R2",358723.0,171704.0,2.6,39.391600,2.4209900,1.48754000)
    f=c("Model_B","R3",358723.0,171704.0,5.0,38.442700,2.3618400,1.45126000)
    g=c("Model_C","R1",358723.0,171704.0,1.0,31.246400,2.2388000,1.30652000)
    h=c("Model_C","R2",358723.0,171704.0,2.6,30.911600,2.2144800,1.29234000)
    i=c("Model_C","R3",358723.0,171704.0,5.0,30.166700,2.1603000,1.26077000)
    df=data.frame(a,b,c,d,e,f,g,h,i)
    df=t(df)
    df=data.frame(df)
    col_list=list("Model","Receptor.name","X(m.)","Y(m.)","Z(m.)",
    "nox","PM10","PM2.5")
    colnames(df)=col_list

Essentially what I am trying is to separate the data frame (df) by the Model names ("Model_A", "Model_B", and "Model_C") and store them in new and different data frames. I have been trying to use the following command

    df_test=split(df,with(df,interaction(Model,Model)), drop = TRUE)

This command separates the data frame but stores them in lists, and I don't know how to extract the lists individually and store them as data frames. Is there a simpler solution (avoiding the subset command if possible as I need the script to be dynamic and relative) or does anyone know how to use the last command shown above to separate the lists into individual data frames? Also if possible, is it possible to name the data frame after the model?

I apologize if these are a lot of questions but any help would be hugely appreciated! Thank you!

LAP · Accepted Answer

list2env(split(df, df$Model), envir = .GlobalEnv) will give you three dataframes in your global environment, named after the models, containing the relevant rows.

> Model_A
    Model Receptor.name  X(m.)  Y(m.) Z(m.)     nox    PM10   PM2.5
a Model_A            R1 358723 171704     1 36.8185 4.02227 1.38895
b Model_A            R2 358723 171704   2.6 36.4473 4.01161 1.37479
c Model_A            R3 358723 171704     5 35.6154 3.80926 1.34301

Although I would just keep the list of three dataframes by only using dflist <- split(df, df$Model).

Why a list? Lists allow you the use of lapply - a looping function that applies an operation over every list element. A quick example: Let's say you'd want to get a frequency table for both PM variables in your data for all three datasets.

For single elements in your global environment this would be

table(Model_A$PM10)
table(Model_A$PM2.5)
...
table(Model_C$PM2.5)

With a list, it would be

lapply(dflist, function(x) table(x["PM10"]))
lapply(dflist, function(x) table(x["PM2.5"]))

Right now, it seems to only save some lines of code, but better yet, the output of lapply is again a list, which you can store in an object and further use for different operations. Due to this, you can have a global environment with only a few objects in it, each being lists which contain certain similar objects, like dataframes, tables, summaries or even plots.

Separating data frame based on column values

Answers (1)

Related Questions