Reputation: 71

How to split a column by multiple delimiters into two seperate columns

Here's a sample of my data :

k <- structure(list(Required.field = c("yes", "yes", "yes"),
                    Choices = c("2, FÃ©minin | 1, Masculin", "1, Oui | 0, Non | 99, Je ne sais pas", "1, Oui | 0, Non")),
               row.names = c(5L, 10L, 15L), class = "data.frame") 
> k
   Required.field                  Choices
5             yes            2, Fémenin| 1, Masculin
10            yes            1, Oui | 0, Non | 99, Je ne sais pas
15            yes            1, Oui | 0, Non

What i'd like to have is something like this :

> result

   Required.field            Number       Value
5             yes            c(2,1)       c(Fémenin, Masculin)
10            yes            c(1,0,99)    c(Oui, Non, Je ne sais pas)
15            yes            c(1,0)       c(Oui, Non)

here's the code i write which doesn't do the job correctly !

k$test = strsplit(k$choice,c(" | "), fixed = T)


bbl = k %>% 
  mutate(number = str_extract_all(test, "[0-9]+")) %>% #get only digits
  mutate(value  = str_extract(test, "[aA-zZ].*")) #get only letters

why is it not working exactly?

Upvotes: 1

Answers (2)

akrun

Reputation: 887158

We may use

library(dplyr)
library(stringr)
k %>% 
   mutate(Number = str_extract_all(Choices, "\\d+"),
   Value = str_extract_all(Choices, "[^0-9,| ]+") )

-output

 Required.field                              Choices   Number                       Value
5             yes            2, FÃ©minin | 1, Masculin     2, 1          FÃ©minin, Masculin
10            yes 1, Oui | 0, Non | 99, Je ne sais pas 1, 0, 99 Oui, Non, Je, ne, sais, pas
15            yes                      1, Oui | 0, Non     1, 0                    Oui, Non

Upvotes: 0

Maël

Reputation: 52004

Here's a solution with tidyr and dplyr functions:

library(tidyr)
library(dplyr)

dat %>% 
  mutate(id = 1:n()) %>% 
  separate_rows(Choices, sep = " \\| ") %>% 
  separate(Choices, into = c("Number", "Value"), sep = ", ", convert = TRUE) %>% 
  group_by(id) %>% 
  summarise(Required.field = unique(Required.field),
            across(c(Number, Value), list))

output

  id Required.field   Number                    Value
1  1            yes     2, 1       FÃ©minin, Masculin
2  2            yes 1, 0, 99 Oui, Non, Je ne sais pas
3  3            yes     1, 0                 Oui, Non

Upvotes: 3

How to split a column by multiple delimiters into two seperate columns

Answers (2)

Related Questions