William_Boot
William_Boot

Reputation: 93

Splitting string values containing columns and simultaneously duplicating other column values in a dataframe

I have a data frame containing 3 columns. The first one has string values some of which contain commas, the other two: ID and area information.

df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, 46", "gh 21"),
                 column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
                 column3 = c("area51", "area52", "area53", "area54"))
head(df)
column1 column2 column3
1 ab 34, 35, 36   ID_27  area51
2         cb 23   ID_28  area52
3     df 45, 46   ID_29  area53
4         gh 21   ID_30  area54

What I would like to do is modify the values in the first column, so that commas separating values disappear, apply the two letter prefix to each number value and rearrage each value separated by a comma in a new row. At the same time, duplicating the values in the other columns, like the example below:

new_df <- data_frame(column1 = c("ab34", "ab35", "ab36", "cb23", "df45", "df46", "gh21"),
                     column2 = c("ID_27", "ID_27", "ID_27", "ID_28", "ID_29", "ID_29", "ID_30"),
                     column3 = c("area51", "area51", "area51", "area52", "area53", "area53", "area54"))
head(new_df)
# A tibble: 6 x 3
column1 column2 column3
<chr>   <chr>   <chr>  
1 ab34    ID_27   area51 
2 ab35    ID_27   area51 
3 ab36    ID_27   area51 
4 cb23    ID_28   area52 
5 df45    ID_29   area53 
6 df46    ID_29   area53 

Does anybody know what R code could achieve this? Using either tidyverse or older approaches? Btw, no need to go from a dataframe to a tibble, this is just for the example. My aim is to transform a dataframe.

Upvotes: 1

Views: 43

Answers (2)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Here's a (hopefully) easier-to understand step-by-step approach:

library(dplyr)
library(stringr)
library(tidyr)
df %>% 
  mutate(
    # extract the prefix:
    prefix = str_extract(column1,"^\\w+"),
    # extract the digits in a list:
    digits = str_extract_all(column1,"\\d+")) %>% 
  # cast the list values in `digits` into long format:
  unnest_longer(digits) %>%
  # paste `prefix`, `digits` together:
  mutate(column1 = paste0(prefix, digits)) %>%
  # remove obsolete columns:
  select(-c(prefix, digits))
# A tibble: 7 × 3
  column1 column2 column3
  <chr>   <chr>   <chr>  
1 ab34    ID_27   area51 
2 ab35    ID_27   area51 
3 ab36    ID_27   area51 
4 cb23    ID_28   area52 
5 df45    ID_29   area53 
6 df46    ID_29   area53 
7 gh21    ID_30   area54 

Upvotes: 0

PaulS
PaulS

Reputation: 25323

A possible solution:

library(tidyverse)

df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, 46", "gh 21"),
                 column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
                 column3 = c("area51", "area52", "area53", "area54"))

df %>% 
  mutate(column1 = str_replace_all(column1,", ", str_extract(column1,"^\\S+")) %>%
                   str_remove(.," ")) %>% 
  separate_rows(column1, sep = "(?<=\\d)(?=\\D)")

#> # A tibble: 7 × 3
#>   column1 column2 column3
#>   <chr>   <chr>   <chr>  
#> 1 ab34    ID_27   area51 
#> 2 ab35    ID_27   area51 
#> 3 ab36    ID_27   area51 
#> 4 cb23    ID_28   area52 
#> 5 df45    ID_29   area53 
#> 6 df46    ID_29   area53 
#> 7 gh21    ID_30   area54

Upvotes: 1

Related Questions