Moosa
Moosa

Reputation: 11

An efficient way to subset a data frame into multiple data frames

I have a fairly large data frame and I'm trying to divide this data frame into multiple smaller ones. Suppose I have this data frame called df:

   Patient   Status    cancer
1        1  treated  melanoma
2        2 deceased  melanoma
3        3 deceased carcinoma
4        4  treated  lymphoma
5        5 deceased  melanoma
6        6  treated carcinoma
7        7 deceased  lymphoma
8        8 deceased carcinoma
9        9  treated  melanoma
10      10  treated  melanoma

I want to subset data frames based on the "cancer" column, and store them in their respective object, as follow:

  Patient   Status    cancer
1       3 deceased carcinoma
2       6  treated carcinoma
3       8 deceased carcinoma

  Patient   Status   cancer
1       1  treated melanoma
2       2 deceased melanoma
3       5 deceased melanoma
4       9  treated melanoma
5      10  treated melanoma

  Patient   Status   cancer
1       4  treated lymphoma
2       7 deceased lymphoma

I've mannaged to write this code, using dplyr's function filter, and it does the job, but because my initial data frame is pretty large, looping chokes my computer,

factors = c(levels(df[,"cancer"]))
for (i in factors) {
  assign(i, filter(df, cancer == i), envir = .GlobalEnv)
  }

I would appreciate if someone could kindly suggest a more optimized alternative.

Best regards.

Upvotes: 1

Views: 97

Answers (1)

Carsten
Carsten

Reputation: 11

If you have data frames for which operations are slow in general consider changing to the data.table framework. You would be surprised of the increase in performance.

Upvotes: 1

Related Questions