Wol44
Wol44

Reputation: 21

R - joining/merging rows in one dataset

I would like to know how to use R to merge rows in one set of data.

Currently my data looks like this:

Text 1     Text 2        Text 3       Text 4
Bob        Aba           Abb          Abc
Robert     Aba           Abb          Abc
Fred       Abd           Abe          Abf
Martin     Abg           Abh          Abi

If text two and text 3 are both the same for two rows (as in rows 1 & 2) I would like to make it into one row with more columns for the other data.

Text 1      Text 1a   Text 2         Text 3       Text 4     Text 4a
Bob         Robert     Aba           Abb          Abc        Abd
Fred        NA         Abd           Abe          Abf        NA
Martin      NA         Abg           Abh          Abi        NA

I did something similar with joining two separate sets of data and merging them using join

join=join(Data1, Data2, by = c('Text2'), type = "full", match = "all")

but I can't work out how to do it for duplicates within one set of data.

I think it might be possible to use aggregate but I have not used it before, my attempt was:

MyDataAgg=aggregate(MyData, by=list(MyData$Text1), c)

but when I try I am getting an output that looks like this on summary:

 1    -none-   numeric                              
 1    -none-   numeric                              
 2    -none-   numeric  

or this on structure:

 $ Initials               :List of 12505
  ..$ 1    : int 62
  ..$ 2    : int 310
  ..$ 3    : int 504

I would also like to be able to combine rows using matching elements of two variables.

Upvotes: 2

Views: 662

Answers (1)

agstudy
agstudy

Reputation: 121588

I don't think you can reshape or aggregate because :

  1. You have duplicated rows that corresponds to the same key
  2. You don't have the same number of value for each keys : you should fill it with missing values

Here a a manual attempt using by to process by key, and rbind.fill to aggregate all the list together. Each by step , is creating a one-row data.frame having (Text2,Text3) as key.

do.call(plyr::rbind.fill,by(dat,list(dat$Text2,dat$Text3),
   function(d){
     ## change all other columns to a one row data.frame
     dd <- as.data.frame(as.list(rapply(d[,-c(2,3)],as.character)))
     ## the tricky part : add 1 to a name like Text1 to become Text11 , 
     ## this is import to join data.frames formed by by
     names(dd) <- gsub('(Text[0-9]$)','\\11',names(dd))
     ## add key to to the row
     cbind(unique(d[,2:3]),dd)
   }))

 Text2 Text3 Text11 Text12 Text41 Text42
1   Aba   Abb    Bob Robert    Abc    Abd
2   Abd   Abe   Fred   <NA>    Abf   <NA>
3   Abg   Abh Martin   <NA>    Abi   <NA>

Upvotes: 1

Related Questions