Splitting string values containing columns and simultaneously duplicating other column values in a dataframe

Question

I have a data frame containing 3 columns. The first one has string values some of which contain commas, the other two: ID and area information.

df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, 46", "gh 21"),
                 column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
                 column3 = c("area51", "area52", "area53", "area54"))
head(df)
column1 column2 column3
1 ab 34, 35, 36   ID_27  area51
2         cb 23   ID_28  area52
3     df 45, 46   ID_29  area53
4         gh 21   ID_30  area54

What I would like to do is modify the values in the first column, so that commas separating values disappear, apply the two letter prefix to each number value and rearrage each value separated by a comma in a new row. At the same time, duplicating the values in the other columns, like the example below:

new_df <- data_frame(column1 = c("ab34", "ab35", "ab36", "cb23", "df45", "df46", "gh21"),
                     column2 = c("ID_27", "ID_27", "ID_27", "ID_28", "ID_29", "ID_29", "ID_30"),
                     column3 = c("area51", "area51", "area51", "area52", "area53", "area53", "area54"))
head(new_df)
# A tibble: 6 x 3
column1 column2 column3
        
1 ab34    ID_27   area51 
2 ab35    ID_27   area51 
3 ab36    ID_27   area51 
4 cb23    ID_28   area52 
5 df45    ID_29   area53 
6 df46    ID_29   area53

Does anybody know what R code could achieve this? Using either tidyverse or older approaches? Btw, no need to go from a dataframe to a tibble, this is just for the example. My aim is to transform a dataframe.

PaulS · Accepted Answer

A possible solution:

library(tidyverse)

df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, 46", "gh 21"),
                 column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
                 column3 = c("area51", "area52", "area53", "area54"))

df %>% 
  mutate(column1 = str_replace_all(column1,", ", str_extract(column1,"^\S+")) %>%
                   str_remove(.," ")) %>% 
  separate_rows(column1, sep = "(?<=\d)(?=\D)")

#> # A tibble: 7 × 3
#>   column1 column2 column3
#>           
#> 1 ab34    ID_27   area51 
#> 2 ab35    ID_27   area51 
#> 3 ab36    ID_27   area51 
#> 4 cb23    ID_28   area52 
#> 5 df45    ID_29   area53 
#> 6 df46    ID_29   area53 
#> 7 gh21    ID_30   area54

Splitting string values containing columns and simultaneously duplicating other column values in a dataframe

Answers (2)

Related Questions