scarlett rouge
scarlett rouge

Reputation: 339

R dataframe transformation: split character observations into multiple rows, rearrange strings

I have a dataframe where one column is filled with character strings structured as follows: surname, given name XX, surname, given name XX, etc. The name combinations are thus divided by an "XX," at the end.

I am looking to

  1. Put each combination of surname, given name into a separate row;
  2. Transform each name combination into given name surname.

this would look as follows:

example <- data.frame(id = c(1,2,3), 
                      names = c("Russell-Moyle, Lloyd XX, Lucas, Caroline XX, Hobhouse, Wera XX", "Benn, Hilary XX, Sobel, Alex XX, West, Catherine XX, Doughty, Stephen XX", "Oswald, Kirsten XX, Thompson, Owen XX, Dorans, Allan XX")
                      )

example

#current output:
#1  1           Russell-Moyle, Lloyd XX, Lucas, Caroline XX, Hobhouse, Wera XX
#2  2 Benn, Hilary XX, Sobel, Alex XX, West, Catherine XX, Doughty, Stephen XX
#3  3                  Oswald, Kirsten XX, Thompson, Owen XX, Dorans, Allan XX

#ideal output:
   id   names
   1    Lloyd Russel-Moyle   
   1    Caroline Lucas  
   1    Were Hobhouse
   2    Hilary Benn 
   2    Alex Sobel   
   2    Catherine West  
   2    Stephan Doughty
   3    Kirsten Oswald 
   3    Owen Thompson   
   3    Allan Dorans

Could anyone help me out? Thanks!!

Upvotes: 2

Views: 79

Answers (1)

Ben Norris
Ben Norris

Reputation: 5747

You can do this with some functions from the tidyr package.

library(tidyr)
library(dplyr)

example %>% 
  separate_rows(names, sep = "( *)XX(,*)( *)") %>% # create one row per name
  separate(names, into = c("last", "first"), sep = ", ") %>%   # separate names into first and last
  unite(names, first, last, sep = " ")

# A tibble: 10 x 2
      id names              
   <dbl> <chr>              
 1     1 Lloyd Russell-Moyle
 2     1 Caroline Lucas     
 3     1 Wera Hobhouse      
 4     2 Hilary Benn        
 5     2 Alex Sobel         
 6     2 Catherine West     
 7     2 Stephen Doughty    
 8     3 Kirsten Oswald     
 9     3 Owen Thompson      
10     3 Allan Dorans      

Here is a break down of the regular expression in the sep = argument of separate_rows():

( *)  # match a sequence starting with 0 or more spaces
XX    # followed by XX
(,*)  # followed by 0 or more commas
( *)  # followed by 0 or more spaces

Upvotes: 1

Related Questions