Reputation: 11
I have a fairly large data frame and I'm trying to divide this data frame into multiple smaller ones. Suppose I have this data frame called df:
Patient Status cancer
1 1 treated melanoma
2 2 deceased melanoma
3 3 deceased carcinoma
4 4 treated lymphoma
5 5 deceased melanoma
6 6 treated carcinoma
7 7 deceased lymphoma
8 8 deceased carcinoma
9 9 treated melanoma
10 10 treated melanoma
I want to subset data frames based on the "cancer" column, and store them in their respective object, as follow:
Patient Status cancer
1 3 deceased carcinoma
2 6 treated carcinoma
3 8 deceased carcinoma
Patient Status cancer
1 1 treated melanoma
2 2 deceased melanoma
3 5 deceased melanoma
4 9 treated melanoma
5 10 treated melanoma
Patient Status cancer
1 4 treated lymphoma
2 7 deceased lymphoma
I've mannaged to write this code, using dplyr's function filter
, and it does the job, but because my initial data frame is pretty large, looping chokes my computer,
factors = c(levels(df[,"cancer"]))
for (i in factors) {
assign(i, filter(df, cancer == i), envir = .GlobalEnv)
}
I would appreciate if someone could kindly suggest a more optimized alternative.
Best regards.
Upvotes: 1
Views: 97
Reputation: 11
If you have data frames for which operations are slow in general consider changing to the data.table framework. You would be surprised of the increase in performance.
Upvotes: 1