User1414
User1414

Reputation: 33

Creating Subset Datasets in R based on distinct field values

I am trying to split a dataset into multiple subsets based on distinct field values. For example:

SaleID Country
Sale1 US
Sale2 Mexico
Sale3 US
Sale4 Canada
Sale5 Canada
Sale6 Mexico
Sale7 Canada

Desired output (3 subsets)

Data 1
SaleID Country
Sale1 US
Sale3 US

Data 2
SaleID Country
Sale2 Mexico
Sale6 Mexico


Data 3
Sale4 Canada
Sale5 Canada
Sale7 Canada

Any ideas? My actual dataset has close to 100 distinct countries. I have explored using the sqldf package in R but no luck yet. Any help would be appreciated

Upvotes: 2

Views: 66

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270020

It would be best to create a list L of those data sets. Then you can refer to L$US or L[["US"]], say, and refer to all the names using names(L).

L <- split(DF, DF$Country)

It would be possible to copy them to individual data frames in the global environment using the following line but it is not really recommended:

list2env(L, .GlobalEnv)

Upvotes: 0

akrun
akrun

Reputation: 887691

We could use split from base R to a list of data.frames. The order of the datasets seems to be order in which the unique values appear in the 'Country'. In that case, we can convert the 'Country' to a factor column with levels specified

df$Country <- factor(df$Country, levels = unique(df$Country))
lst <- lapply(split(df, df$Country), `row.names<-`, NULL)
names(lst) <- paste0("Data", seq_along(lst))

It is better to use the list for processing the data instead of having multiple objects. But, if we really need it, then list2env

list2env(lst, envir = .GlobalEnv)
Data1
#  SaleID Country
#1  Sale1      US
#2  Sale3      US

Data2
#  SaleID Country
#1  Sale2  Mexico
#2  Sale6  Mexico

Data3
#  SaleID Country
#1  Sale4  Canada
#2  Sale5  Canada
#3  Sale7  Canada

Upvotes: 0

Related Questions