Reputation: 317
So here is my challenge. I am trying to get rid of rows of data that are best organized as a column. The original data set looks like
1|1|a
2|3|b
2|5|c
1|4|d
1|2|e
10|10|f
And the end result desired is
1 |1,2,4 |a| e d
2 |3,5 |b| c
10|10 |f| NA
The table's shaping is based from minimum value Col 2 within groupings of Col 1, where new column 3 is defined from the minimum values within the group and new column 4 is collapsed from not the minimum of. Some of the approaches tried include:
newTable[min(newTable[,(1%o%2)]),] ## returns the minimum of both COL 1 and 2 only
ddply(newTable,"V1", summarize, newCol = paste(V7,collapse = " ")) ## collapses all values by Col 1 and creates a new column nicely.
Variations to combine these lines of code into a single line have not worked, in part to my limited knowledge. These modifications are not included here.
Upvotes: 1
Views: 1940
Reputation: 118779
Here's an approach using data.table
, with data from @akrun's post:
It might be useful to store the columns as list
instead of pasting them together.
require(data.table) ## 1.9.2+
setDT(dat)[order(V1, V2), list(V2=list(V2), V3=V3[1L], V4=list(V3[-1L])), by=V1]
# V1 V2 V3 V4
# 1: 1 1,2,4 a e,d
# 2: 2 3,5 b c
# 3: 10 10 f
setDT(dat)
converts the data.frame to data.table, by reference (without copying it). Then, we sort it by columns V1,V2
and group by V1
column on the sorted data, and for each group, we create the columns V2
, V3
and V4
as shown.
V2
and V4
will be of type list
here. If you'd rather have a character column where all entries are pasted together, just replace list(.)
with paste(., sep=...)
.
HTH
Upvotes: 3
Reputation: 886938
Try:
library(dplyr)
library(tidyr)
dat %>%
group_by(V1) %>%
summarise_each(funs(paste(sort(.), collapse=","))) %>%
extract(V3, c("V3", "V4"), "(.),?(.*)")
gives the output
# V1 V2 V3 V4
#1 1 1,2,4 a d,e
#2 2 3,5 b c
#3 10 10 f
Or using aggregate
and str_split_fixed
res1 <- aggregate(.~ V1, data=dat, FUN=function(x) paste(sort(x), collapse=","))
library(stringr)
res1[, paste0("V", 3:4)] <- as.data.frame(str_split_fixed(res1$V3, ",", 2),
stringsAsFactors=FALSE)
If you need NA
for missing values
res1[res1==''] <- NA
res1
# V1 V2 V3 V4
#1 1 1,2,4 a d,e
#2 2 3,5 b c
#3 10 10 f <NA>
dat <- structure(list(V1 = c(1L, 2L, 2L, 1L, 1L, 10L), V2 = c(1L, 3L,
5L, 4L, 2L, 10L), V3 = c("a", "b", "c", "d", "e", "f")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 4