Aaron
Aaron

Reputation: 181

Double splitting datasets in R

Is there a way to split a dataset into permutations of its original components? For example, I realized just now that split() splits a dataset (and the columns selected) into mini-data sets for each element of the columns but if I had a dataset "championships" with columns "question" with elements

a, b, c

and "year" with elements

2018, 2019

(among other columns) and I wanted to create mini-datasets for all observations in "championships" that had "question" = 1, year = "2018" and whatever elements from whatever other columns, how would I do this?

EDIT: Additionally, the columns I am working with have a lot more elements than these examples so how would I create new objects for each of them?

My expected results are basically what I imagine what would happen if I applied the filter() function to each element of "question" and then to each element of "year" and then created objects for every single one of those outputs.

The dataset:

structure(list(id = structure(c(25, 25, 25, 25, 25, 25, 25, 25, 
25, 25), format.stata = "%8.0g"), year = structure(c(2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018), format.stata = "%8.0g"), 
    round = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), format.stata = "%8.0g"), 
    question = structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), format.stata = "%8.0g"), 
    correct = structure(c(0, 0, 0, 0, 0, 0, 1, 0, 1, 0), format.stata = "%8.0g")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Views: 84

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

You can split the dataset for each question and year, assign names as per choice and use list2env to create the individual datasets in global environment.

data <- split(df, list(df$question, df$year))
names(data) <- sub('(\\d+)\\.(\\d+)', 'question\\1_year\\2', names(data))
names(data)
# [1] "question1_year2018"  "question2_year2018"  "question3_year2018" 
# [4] "question4_year2018"  "question5_year2018"  "question6_year2018" 
# [7] "question7_year2018"  "question8_year2018"  "question9_year2018" 
#[10] "question10_year2018"

list2env(data, .GlobalEnv)

question1_year2018
# A tibble: 1 x 5
#     id  year round question correct
#  <dbl> <dbl> <dbl>    <dbl>   <dbl>
#1    25  2018     1        1       0

question2_year2018
# A tibble: 1 x 5
#     id  year round question correct
#  <dbl> <dbl> <dbl>    <dbl>   <dbl>
#1    25  2018     1        2       0

It is not a good practice to create multiple datasets in global environment. You should keep them in lists, it is easier to manage them that way.

Upvotes: 1

Related Questions