Reputation: 23
I have a Data Frame in R which has 3 columns: IDs (which may and probably repeat), codes and descriptions. I need to create a code segment that, using that data frame, returns a data frame with the same number of rows in which each row has one ID, one code, and all descriptions associated with that ID in the original DF (either in different columns or a single column with pasted text, both are fine);
So, for instance, I have the following data frame df:
IDstest <- c(1:5,5:1,3,4,1)
codestest <- c("X1","Z1","C1","X1","X2","J9","A","Y1","Z2","C5","A","P2","Z")
descriptiontest <- c("Desc 1","Desc 2","Test","Just typing randomly","Desc 4","Desc 5","Desc 1","Random","Desc ZZZ","Desc 1","YYY","XYZ","Desc 4","Test")
df <- data.frame(IDstest, codestest, descriptiontest)
df
IDstest codestest descriptiontest
1 1 X1 Desc 1
2 2 Z1 Desc 2
3 3 C1 Test
4 4 X1 Just typing randomly
5 5 X2 Desc 4
6 5 J9 Desc 5
7 4 A Desc 1
8 3 Y1 Random
9 2 Z2 Desc ZZZ
10 1 C5 Desc 1
11 3 A YYY
12 4 P2 XYZ
13 1 Z Desc 4
And I wish to receive something similar to:
IDstest codestest descriptiontest
1 1 X1 Desc 1; Desc 1; Desc 4
2 2 Z1 Desc 2; Desc ZZZ
3 3 C1 Test; Random; YYY
4 4 X1 Just typing randomly; Desc 1; XYZ
5 5 X2 Desc 4; Desc 5
6 5 J9 Desc 5; Desc 4
7 4 A Desc 1;Just typing randomly; XYZ
8 3 Y1 Random; Test; YYY
9 2 Z2 Desc ZZZ; Desc 2
10 1 C5 Desc 1; Desc 1; Desc 4
11 3 A YYY; Test; Random
12 4 P2 XYZ; Just typing randomly; Desc 1
13 1 Z Desc 4; Desc 1; Desc 1
As mentioned, the matching text from other rows doesn't have to be in the 'descriptiontest' column, adding columns is fine.
Can you help me?
Upvotes: 2
Views: 59
Reputation: 652
This is a quick and dirty way to do it. I'm sure someone else will come along with an lapply single line method. :)
IDstest <- c(1:5,5:1,3,4,1)
codestest <- c("X1","Z1","C1","X1","X2","J9","A","Y1","Z2","C5","A","P2","Z")
descriptiontest <- c("Desc 1","Desc 2","Test","Just typing randomly","Desc 4","Desc 5","Desc 1","Random","Desc ZZZ","Desc 1","YYY","XYZ","Desc 4")
df <- data.frame(IDstest, codestest, descriptiontest)
uniqueIDs <- unique(df[,"IDstest"])
mergedescription <- rep("", length(uniqueIDs))
for(i in uniqueIDs ) {
mergedescription[i] <- paste(df[IDstest == i, "descriptiontest"], collapse = "; ")
}
mdf <- data.frame(IDstest = uniqueIDs, mergedescription)
final.df <- merge(df, mdf)
This sorts the records by IDstest as a side effect:
IDstest codestest descriptiontest mergedescription
1 1 X1 Desc 1 Desc 1; Desc 1; Desc 4
2 1 C5 Desc 1 Desc 1; Desc 1; Desc 4
3 1 Z Desc 4 Desc 1; Desc 1; Desc 4
4 2 Z1 Desc 2 Desc 2; Desc ZZZ
5 2 Z2 Desc ZZZ Desc 2; Desc ZZZ
6 3 C1 Test Test; Random; YYY
7 3 Y1 Random Test; Random; YYY
8 3 A YYY Test; Random; YYY
9 4 X1 Just typing randomly Just typing randomly; Desc 1; XYZ
10 4 A Desc 1 Just typing randomly; Desc 1; XYZ
11 4 P2 XYZ Just typing randomly; Desc 1; XYZ
12 5 J9 Desc 5 Desc 4; Desc 5
13 5 X2 Desc 4 Desc 4; Desc 5
Upvotes: 1