YBKL
YBKL

Reputation: 25

extract multiple values from duplicated ID

I have a problem with a string variable. if fact I have a list of patients (with their ID's as string) and the surgical procedures they received (as string as well)

the data presents like that

a=c("A","A","A","B","B","B","C","C","C")  #a is patient id
b=c(1:9)
c=c("asfdf","Sdfdsf","sdf","DF","Sdf","sdfds","sdff","cxv","vbnvb") # c is surgical procedure name
df=cbind(a,b,c)
df
     a   b   c       
 [1,] "A" "1" "asfdf" 
 [2,] "A" "2" "Sdfdsf"
 [3,] "A" "3" "sdf"   
 [4,] "B" "4" "DF"    
 [5,] "B" "5" "Sdf"   
 [6,] "B" "6" "sdfds" 
 [7,] "C" "7" "sdff"  
 [8,] "C" "8" "cxv"   
 [9,] "C" "9" "vbnvb" 

what I want is to produce a dataframe that contains the patient ID and all of the procedure this person received as shown below

I want something like that

"A" "asfdf|Sdfdsf|sdf" "B" "DF|Sdf|sdfds" "C" "sdff|cxv|vbnvb"

or separated in 3 columns ( I will collapse them using paste)

Upvotes: 1

Views: 26

Answers (1)

akrun
akrun

Reputation: 887291

We could use aggregate with paste

aggregate(c ~ a, df, FUN = paste, collapse="|")
#  a                c
#1 A asfdf|Sdfdsf|sdf
#2 B     DF|Sdf|sdfds
#3 C   sdff|cxv|vbnvb

We can also use data.table methods to make this faster

library(data.table)
setDT(df)[, .(c = paste(c, collapse="|")), .(a)]

data

df <- data.frame(a, b, c)

Upvotes: 1

Related Questions