donnek
donnek

Reputation: 221

R: Combine multiple records over multiple columns

I have data with multiple records for each participant (the number of records varies). I'm trying to combine these records into one for each participant by combining each column's records forthat participant.

So, if I have data like:

dummy<-tribble(
  ~id, ~A, ~B, ~C, ~D,
  1, "one", "two", "three", "four",
  1, "one", "two", "three", "five",
  1, "one", "six", "three", "four",
  1, "one", "seven", "three", "five",
  2, "one", "two", "three", "four",
  2, "one", "two", "six", "five",
  3, "one", "two", "three", "four",
  3, "one", "seven", "six", "five",
  3, "one", "two", "six", "eight"
)

I'm looking for output like:

1, "one+one+one+one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one+one", "two+two", "three+six", "four+five",
3, "one+one+one", "two+seven+two", "three+six+six", "four+five+eight",

I'd prefer to use tidyverse, and I feel that group_by and unite would come in here somewhere, but I haven't a clue as to how to loop through the varying number of records for each participant, and apply that over all the columns (there are 28 in the real data).

Ideally, I'd also then want to discard repeated data, so that I get:

1, "one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one", "two", "three+six", "four+five",
3, "one", "two+seven+two", "three+six+six", "four+five+eight",

Any suggestions on how to accomplish this?

Upvotes: 1

Views: 76

Answers (3)

akrun
akrun

Reputation: 887901

Using str_c

library(dplyr)
library(stringr)
dummy %>%
    group_by(id) %>%
    summarise(across(A:D,  ~str_c(unique(.), collapse = "+")))

-output

# A tibble: 3 x 5
     id A     B             C         D              
  <dbl> <chr> <chr>         <chr>     <chr>          
1     1 one   two+six+seven three     four+five      
2     2 one   two           three+six four+five      
3     3 one   two+seven     three+six four+five+eight

Upvotes: 1

AnilGoyal
AnilGoyal

Reputation: 26238

For first output you may also do

library(tidyverse)

dummy<-tribble(
  ~id, ~A, ~B, ~C, ~D,
  1, "one", "two", "three", "four",
  1, "one", "two", "three", "five",
  1, "one", "six", "three", "four",
  1, "one", "seven", "three", "five",
  2, "one", "two", "three", "four",
  2, "one", "two", "six", "five",
  3, "one", "two", "three", "four",
  3, "one", "seven", "six", "five",
  3, "one", "two", "six", "eight"
)

dummy %>% group_by(id) %>%
  summarise(across(everything(), ~paste(., collapse = '+')))
#> # A tibble: 3 x 5
#>      id A               B                C                     D                
#>   <dbl> <chr>           <chr>            <chr>                 <chr>            
#> 1     1 one+one+one+one two+two+six+sev~ three+three+three+th~ four+five+four+f~
#> 2     2 one+one         two+two          three+six             four+five        
#> 3     3 one+one+one     two+seven+two    three+six+six         four+five+eight

Created on 2021-06-28 by the reprex package (v2.0.0)

Upvotes: 0

Zaw
Zaw

Reputation: 1474

group_by() and summarise() would do. unique() removes the repeated data.

dummy %>% 
  group_by(id) %>% 
  summarise(across(A:D, ~ paste(unique(.), collapse = "+")))

# # A tibble: 3 x 5
#      id A     B             C         D
#   <dbl> <chr> <chr>         <chr>     <chr>
# 1     1 one   two+six+seven three     four+five      
# 2     2 one   two           three+six four+five      
# 3     3 one   two+seven     three+six four+five+eight

Upvotes: 0

Related Questions