Reputation: 75
I am having a bit of trouble with trying to script a code in R so that it separates a data frame based on the character in a data frame column without manually specifying a subset command. Below is the script for reproduction in R:
a=c("Model_A","R1",358723.0,171704.0,1.0,36.818500,4.0222700,1.38895000)
b=c("Model_A","R2",358723.0,171704.0,2.6,36.447300,4.0116100,1.37479000)
c=c("Model_A","R3",358723.0,171704.0,5.0,35.615400,3.8092600,1.34301000)
d=c("Model_B","R1",358723.0,171704.0,1.0,39.818300,2.4475600,1.50384000)
e=c("Model_B","R2",358723.0,171704.0,2.6,39.391600,2.4209900,1.48754000)
f=c("Model_B","R3",358723.0,171704.0,5.0,38.442700,2.3618400,1.45126000)
g=c("Model_C","R1",358723.0,171704.0,1.0,31.246400,2.2388000,1.30652000)
h=c("Model_C","R2",358723.0,171704.0,2.6,30.911600,2.2144800,1.29234000)
i=c("Model_C","R3",358723.0,171704.0,5.0,30.166700,2.1603000,1.26077000)
df=data.frame(a,b,c,d,e,f,g,h,i)
df=t(df)
df=data.frame(df)
col_list=list("Model","Receptor.name","X(m.)","Y(m.)","Z(m.)",
"nox","PM10","PM2.5")
colnames(df)=col_list
Essentially what I am trying is to separate the data frame (df) by the Model names ("Model_A", "Model_B", and "Model_C") and store them in new and different data frames. I have been trying to use the following command
df_test=split(df,with(df,interaction(Model,Model)), drop = TRUE)
This command separates the data frame but stores them in lists, and I don't know how to extract the lists individually and store them as data frames. Is there a simpler solution (avoiding the subset command if possible as I need the script to be dynamic and relative) or does anyone know how to use the last command shown above to separate the lists into individual data frames? Also if possible, is it possible to name the data frame after the model?
I apologize if these are a lot of questions but any help would be hugely appreciated! Thank you!
Upvotes: 0
Views: 136
Reputation: 6695
list2env(split(df, df$Model), envir = .GlobalEnv)
will give you three dataframes in your global environment, named after the models, containing the relevant rows.
> Model_A
Model Receptor.name X(m.) Y(m.) Z(m.) nox PM10 PM2.5
a Model_A R1 358723 171704 1 36.8185 4.02227 1.38895
b Model_A R2 358723 171704 2.6 36.4473 4.01161 1.37479
c Model_A R3 358723 171704 5 35.6154 3.80926 1.34301
Although I would just keep the list of three dataframes by only using dflist <- split(df, df$Model)
.
Why a list? Lists allow you the use of lapply
- a looping function that applies an operation over every list element. A quick example: Let's say you'd want to get a frequency table for both PM
variables in your data for all three datasets.
For single elements in your global environment this would be
table(Model_A$PM10)
table(Model_A$PM2.5)
...
table(Model_C$PM2.5)
With a list, it would be
lapply(dflist, function(x) table(x["PM10"]))
lapply(dflist, function(x) table(x["PM2.5"]))
Right now, it seems to only save some lines of code, but better yet, the output of lapply
is again a list
, which you can store in an object and further use for different operations. Due to this, you can have a global environment with only a few objects in it, each being lists which contain certain similar objects, like dataframes, tables, summaries or even plots.
Upvotes: 1