joy_1379
joy_1379

Reputation: 499

Parsing of rows in r

I have a data frame which I want to parse based on relatives column

df <- data.frame(Name = c("abc","def","ghi","klm"),
             Relatives = c("Mother & Father","Mother and brother","Husband,wife,Sister in law and Daughter","Self"))



Name                               Relatives
abc                         Mother & Father
def                      Mother and brother
ghi             Husband,wife,Sister in law and Daughter
klm                                    Self

Final output

Name               Relatives
abc                 Mother
abc                 Father
def                 Mother
def                 brother
ghi                 Husband
ghi                 wife
ghi                 Sister in law
ghi                 daughter
klm                 self

Any help is useful, Thanks!

Upvotes: 0

Views: 44

Answers (4)

abusiro
abusiro

Reputation: 23

Try something like this

df %>% 
separate(Relatives, sep = "( & |,| and )", into = letters[1:3]) %>% 
pivot_longer(2:4, values_to = "Relatives",values_drop_na = TRUE) %>% 
select(Name, Relatives)

and you'll get this:

Name  Relatives
  <chr> <chr>    
1 abc   Mother   
2 abc   Father   
3 def   Mother   
4 def   and      
5 def   brother  
6 ghi   Husband  
7 ghi   wife     
8 ghi   and      
9 klm   self    

Upvotes: 0

Desmond
Desmond

Reputation: 1137

Using tidyverse and tidytext:

library(tidyverse)
library(tidytext)

df <- data.frame(
  Name = c("abc", "def", "ghi", "klm"),
  Relatives = c(
    "Mother & Father",
    "Mother and brother",
    "Husband,wife,Sister in law and Daughter",
    "Self"
  )
)

df %>%
  mutate(across(where(is.factor), as.character)) %>%
  unnest_tokens(
    text,
    Relatives,
    token = "regex",
    pattern = "\\&| and |,",
    to_lower = FALSE
  ) %>%
  mutate(text = str_trim(text))

Output:

  Name          text
1  abc        Mother
2  abc        Father
3  def        Mother
4  def       brother
5  ghi       Husband
6  ghi          wife
7  ghi Sister in law
8  ghi      Daughter
9  klm          Self

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

Using tidyr::separate_rows :

library(dplyr)
library(tidyr)

df %>%
  separate_rows(Relatives, sep = '&|\\band\\b|,') %>%
  mutate(Relatives = trimws(Relatives))

#  Name  Relatives    
#  <chr> <chr>        
#1 abc   Mother       
#2 abc   Father       
#3 def   Mother       
#4 def   brother      
#5 ghi   Husband      
#6 ghi   wife         
#7 ghi   Sister in law
#8 ghi   Daughter     
#9 klm   Self         

Upvotes: 2

s_baldur
s_baldur

Reputation: 33488

Quick and dirty in base R:

tmp <- strsplit(df$Relatives, '\\&|,| and ')
data.frame(Name = rep(df$Name, lengths(tmp)), Relatives = trimws(unlist(tmp)))
#   Name     Relatives
# 1  abc        Mother
# 2  abc        Father
# 3  def        Mother
# 4  def       brother
# 5  ghi       Husband
# 6  ghi          wife
# 7  ghi Sister in law
# 8  ghi      Daughter
# 9  klm          Self

Same logic with data.table:

library(data.table)
setDT(df)
df[, strsplit(Relatives, '\\&|,| and '), by = Name
   ][, Relatives := trimws(V1)][, V1 := NULL][]

Upvotes: 3

Related Questions