Aaron
Aaron

Reputation: 317

R: collapse rows and then convert row into a new column

So here is my challenge. I am trying to get rid of rows of data that are best organized as a column. The original data set looks like

1|1|a
2|3|b
2|5|c
1|4|d
1|2|e
10|10|f

And the end result desired is

1 |1,2,4 |a| e d
2 |3,5   |b| c
10|10    |f| NA

The table's shaping is based from minimum value Col 2 within groupings of Col 1, where new column 3 is defined from the minimum values within the group and new column 4 is collapsed from not the minimum of. Some of the approaches tried include:

newTable[min(newTable[,(1%o%2)]),] ## returns the minimum of both COL 1 and 2 only

ddply(newTable,"V1", summarize, newCol = paste(V7,collapse = " ")) ## collapses all values by Col 1 and creates a new column nicely.

Variations to combine these lines of code into a single line have not worked, in part to my limited knowledge. These modifications are not included here.

Upvotes: 1

Views: 1940

Answers (2)

Arun
Arun

Reputation: 118779

Here's an approach using data.table, with data from @akrun's post:

It might be useful to store the columns as list instead of pasting them together.

require(data.table) ## 1.9.2+
setDT(dat)[order(V1, V2), list(V2=list(V2), V3=V3[1L], V4=list(V3[-1L])), by=V1]
#    V1    V2 V3  V4
# 1:  1 1,2,4  a e,d
# 2:  2   3,5  b   c
# 3: 10    10  f    

setDT(dat) converts the data.frame to data.table, by reference (without copying it). Then, we sort it by columns V1,V2 and group by V1 column on the sorted data, and for each group, we create the columns V2, V3 and V4 as shown.

V2 and V4 will be of type list here. If you'd rather have a character column where all entries are pasted together, just replace list(.) with paste(., sep=...).

HTH

Upvotes: 3

akrun
akrun

Reputation: 886938

Try:

 library(dplyr)
 library(tidyr)

 dat %>% 
     group_by(V1) %>% 
     summarise_each(funs(paste(sort(.), collapse=","))) %>%
     extract(V3, c("V3", "V4"), "(.),?(.*)")

gives the output

  #  V1    V2 V3  V4
  #1  1 1,2,4  a d,e
  #2  2   3,5  b   c
  #3 10    10  f    

Or using aggregate and str_split_fixed

 res1 <- aggregate(.~ V1, data=dat, FUN=function(x) paste(sort(x), collapse=","))
 library(stringr)
 res1[, paste0("V", 3:4)] <- as.data.frame(str_split_fixed(res1$V3, ",", 2), 
                                              stringsAsFactors=FALSE)

If you need NA for missing values

  res1[res1==''] <- NA
  res1
  # V1    V2 V3   V4
 #1  1 1,2,4  a  d,e
 #2  2   3,5  b    c
 #3 10    10  f <NA>

data

dat <- structure(list(V1 = c(1L, 2L, 2L, 1L, 1L, 10L), V2 = c(1L, 3L, 
5L, 4L, 2L, 10L), V3 = c("a", "b", "c", "d", "e", "f")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 4

Related Questions