Reputation: 181
Is there a way to split a dataset into permutations of its original components? For example, I realized just now that split() splits a dataset (and the columns selected) into mini-data sets for each element of the columns but if I had a dataset "championships" with columns "question" with elements
a, b, c
and "year" with elements
2018, 2019
(among other columns) and I wanted to create mini-datasets for all observations in "championships" that had "question" = 1, year = "2018" and whatever elements from whatever other columns, how would I do this?
EDIT: Additionally, the columns I am working with have a lot more elements than these examples so how would I create new objects for each of them?
My expected results are basically what I imagine what would happen if I applied the filter() function to each element of "question" and then to each element of "year" and then created objects for every single one of those outputs.
The dataset:
structure(list(id = structure(c(25, 25, 25, 25, 25, 25, 25, 25,
25, 25), format.stata = "%8.0g"), year = structure(c(2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018), format.stata = "%8.0g"),
round = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), format.stata = "%8.0g"),
question = structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), format.stata = "%8.0g"),
correct = structure(c(0, 0, 0, 0, 0, 0, 1, 0, 1, 0), format.stata = "%8.0g")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 84
Reputation: 388982
You can split the dataset for each question
and year
, assign names as per choice and use list2env
to create the individual datasets in global environment.
data <- split(df, list(df$question, df$year))
names(data) <- sub('(\\d+)\\.(\\d+)', 'question\\1_year\\2', names(data))
names(data)
# [1] "question1_year2018" "question2_year2018" "question3_year2018"
# [4] "question4_year2018" "question5_year2018" "question6_year2018"
# [7] "question7_year2018" "question8_year2018" "question9_year2018"
#[10] "question10_year2018"
list2env(data, .GlobalEnv)
question1_year2018
# A tibble: 1 x 5
# id year round question correct
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 25 2018 1 1 0
question2_year2018
# A tibble: 1 x 5
# id year round question correct
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 25 2018 1 2 0
It is not a good practice to create multiple datasets in global environment. You should keep them in lists, it is easier to manage them that way.
Upvotes: 1