dataframe partial merge in R

Question

I have a data frame looks like below:

name    workplace   year   note1  note2  job
Ben     Alpha       2011   xxxx   xx     director
Ben     Beta        2011   xx     xxx    director
Ben     Beta        2011   xxx    xxxx   vice president
Wendy   Sigma       2011   xxxx   x      director
Wendy   Sigma       2011   xx     xx     vice president
Wendy   Sigma       2011   x      xxx    CEO
Alice   Beta        2011   xxx    x      staff
Alice   Beta        2012   xx     xx     deputy director

I want to identify and merge the duplicated rows based on columns "name", "workplace" and "year" (don't consider columns "note1" and "note2). And the information in column "job" will be merged. The output should look like below. Note that the information in "job" is merged based on matching "name", "workplace" and "year". Information in "note1" and "note2" don't need to merge and should be the "note1" and "note2" information in the first row of the matching rows:

name    workplace   year   note1  note2  job.1      job.2           job.3
Ben     Alpha       2011   xxxx   xx     director   NA              NA
Ben     Beta        2011   xx     xxx    director   vice president
Wendy   Sigma       2011   xxxx   x      director   vice president  CEO
Alice   Beta        2011   xxx    x      staff      NA
Alice   Beta        2012   xx     xx     secretary  NA              NA

AnilGoyal · Accepted Answer

Another approach without using pivot approach. Here you can use first or last or whatever aggregate functions for notes fields as desired. You can make use of appropriate arguments to mute all warnings

df %>% group_by(name, workplace, year) %>%
  summarise(note1 = last(note1),
            note2 = last(note2),
            job = toString(job), .groups = 'drop') %>%
  separate(job, into = paste0('Job', seq_len(max(1 + str_count(.$job, ',')))), 
           sep = ', ',
           extra = "drop", 
           fill = 'right')

# A tibble: 5 x 8
  name  workplace  year note1 note2 Job1            Job2           Job3 
                                
1 Alice Beta       2011 xxx   x     staff           NA             NA   
2 Alice Beta       2012 xx    xx    deputy director NA             NA   
3 Ben   Alpha      2011 xxxx  xx    director        NA             NA   
4 Ben   Beta       2011 xxx   xxxx  director        vice president NA   
5 Wendy Sigma      2011 x     xxx   director        vice president CEO

dataframe partial merge in R

Answers (2)

Related Questions