Reputation: 33
I am trying to split a dataset into multiple subsets based on distinct field values. For example:
SaleID Country
Sale1 US
Sale2 Mexico
Sale3 US
Sale4 Canada
Sale5 Canada
Sale6 Mexico
Sale7 Canada
Desired output (3 subsets)
Data 1
SaleID Country
Sale1 US
Sale3 US
Data 2
SaleID Country
Sale2 Mexico
Sale6 Mexico
Data 3
Sale4 Canada
Sale5 Canada
Sale7 Canada
Any ideas? My actual dataset has close to 100 distinct countries. I have explored using the sqldf package in R but no luck yet. Any help would be appreciated
Upvotes: 2
Views: 66
Reputation: 270020
It would be best to create a list L
of those data sets. Then you can refer to L$US
or L[["US"]]
, say, and refer to all the names using names(L)
.
L <- split(DF, DF$Country)
It would be possible to copy them to individual data frames in the global environment using the following line but it is not really recommended:
list2env(L, .GlobalEnv)
Upvotes: 0
Reputation: 887691
We could use split
from base R
to a list
of data.frame
s. The order of the datasets seems to be order in which the unique
values appear in the 'Country'. In that case, we can convert the 'Country' to a factor
column with levels
specified
df$Country <- factor(df$Country, levels = unique(df$Country))
lst <- lapply(split(df, df$Country), `row.names<-`, NULL)
names(lst) <- paste0("Data", seq_along(lst))
It is better to use the list
for processing the data instead of having multiple objects. But, if we really need it, then list2env
list2env(lst, envir = .GlobalEnv)
Data1
# SaleID Country
#1 Sale1 US
#2 Sale3 US
Data2
# SaleID Country
#1 Sale2 Mexico
#2 Sale6 Mexico
Data3
# SaleID Country
#1 Sale4 Canada
#2 Sale5 Canada
#3 Sale7 Canada
Upvotes: 0