Reputation: 11
My question is how can I create a tibble or data frame which should look like the following:
So the JEL codes should be grouped by letters, by letters and first number and then per letter and two numbers. Util now I was not able to extract the letters out of JEL_codes, because it is a list and when I do extract it this list is longer than the list of the volumes and I cant create a tibble anymore.
At the moment I have tibble that looks like this where I got observations of JEL_codes and the volume:
Maybe as a first step I would seperate the observations so that one JEL code matches with a volume , so no thwo JEL codes are in a row. Anything that brings me a little bit further helps me, thank you a lot!
Upvotes: 1
Views: 97
Reputation: 125238
Maybe this is what you are looking for:
BTW: To help us to help you could you please make your issue reproducible by sharing a sample of your data, the code you tried and the packages you used? See how to make a minimal reproducible example To post your data type dput(NAME_OF_DATASET)
into the console and copy & paste the output starting with structure(....
into your post.
library(tidyr)
library(tibble)
library(dplyr)
d <- tibble(
JEL_codes = list(c("Q54", "Q58"), c("Q15", "Q25"), "H23"),
volume = c(2, 3, 6)
)
d %>%
unnest(JEL_codes) %>%
mutate(JEL_code1 = gsub("^([A-Z]).+", "\\1", JEL_codes),
JEL_code2 = gsub("^([A-Z]\\d).+", "\\1", JEL_codes)) %>%
select(JEL_code1, JEL_code2, JEL_code3 = JEL_codes, volume) %>%
pivot_longer(-volume, names_to = "foo", values_to = "jel_code") %>%
select(-foo) %>%
count(jel_code, volume, sort = TRUE) %>%
pivot_wider(names_from = volume, values_from = n, values_fill = 0) %>%
arrange(jel_code)
#> # A tibble: 11 x 4
#> jel_code `2` `3` `6`
#> <chr> <int> <int> <int>
#> 1 H 0 0 1
#> 2 H2 0 0 1
#> 3 H23 0 0 1
#> 4 Q 2 2 0
#> 5 Q1 0 1 0
#> 6 Q15 0 1 0
#> 7 Q2 0 1 0
#> 8 Q25 0 1 0
#> 9 Q5 2 0 0
#> 10 Q54 1 0 0
#> 11 Q58 1 0 0
Upvotes: 1