Reputation: 221
I have data with multiple records for each participant (the number of records varies). I'm trying to combine these records into one for each participant by combining each column's records forthat participant.
So, if I have data like:
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
I'm looking for output like:
1, "one+one+one+one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one+one", "two+two", "three+six", "four+five",
3, "one+one+one", "two+seven+two", "three+six+six", "four+five+eight",
I'd prefer to use tidyverse
, and I feel that group_by
and unite
would come in here somewhere, but I haven't a clue as to how to loop through the varying number of records for each participant, and apply that over all the columns (there are 28 in the real data).
Ideally, I'd also then want to discard repeated data, so that I get:
1, "one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one", "two", "three+six", "four+five",
3, "one", "two+seven+two", "three+six+six", "four+five+eight",
Any suggestions on how to accomplish this?
Upvotes: 1
Views: 76
Reputation: 887901
Using str_c
library(dplyr)
library(stringr)
dummy %>%
group_by(id) %>%
summarise(across(A:D, ~str_c(unique(.), collapse = "+")))
-output
# A tibble: 3 x 5
id A B C D
<dbl> <chr> <chr> <chr> <chr>
1 1 one two+six+seven three four+five
2 2 one two three+six four+five
3 3 one two+seven three+six four+five+eight
Upvotes: 1
Reputation: 26238
For first output you may also do
library(tidyverse)
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
dummy %>% group_by(id) %>%
summarise(across(everything(), ~paste(., collapse = '+')))
#> # A tibble: 3 x 5
#> id A B C D
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 one+one+one+one two+two+six+sev~ three+three+three+th~ four+five+four+f~
#> 2 2 one+one two+two three+six four+five
#> 3 3 one+one+one two+seven+two three+six+six four+five+eight
Created on 2021-06-28 by the reprex package (v2.0.0)
Upvotes: 0
Reputation: 1474
group_by()
and summarise()
would do. unique()
removes the repeated data.
dummy %>%
group_by(id) %>%
summarise(across(A:D, ~ paste(unique(.), collapse = "+")))
# # A tibble: 3 x 5
# id A B C D
# <dbl> <chr> <chr> <chr> <chr>
# 1 1 one two+six+seven three four+five
# 2 2 one two three+six four+five
# 3 3 one two+seven three+six four+five+eight
Upvotes: 0