Luc
Luc

Reputation: 958

R aggregate on large number of columns without specifying column names

I could not find an answer to my question using the search function here nor on Google.

I have a data frame (500 columns wide, 200.000 rows long) with multiple rows per person. Each cell (except for the first column which has a person ID) contains a 0 or a 1. I am looking for a way to reduce this data frame to 1 row per person, in which I take the maximum for each column by person.

I know that I could use ddply, or data.table... like below...

tt <-data.frame(person=c(1,1,1,2,2,2,3,3,3), col1=c(0,0,1,1,1,0,0,0,0),col2=c(1, 1, 0, 0, 0, 0, 1 ,0 ,1))

library(plyr)
ddply(tt, .(person), summarize, col1=max(col1), col2=max(col2))

  person col1 col2
      1    1    1
      2    1    0
      3    0    1

But I don't want to be specifying each of my column names because 1) I have 500 and 2) on a new data set they might be different.

Upvotes: 2

Views: 937

Answers (3)

akrun
akrun

Reputation: 887681

Or use data.table.

library(data.table)
setDT(tt)[, lapply(.SD, max), person]
#    person col1 col2
#1:      1    1    1
#2:      2    1    0
#3:      3    0    1

Upvotes: 3

Jaehyeon Kim
Jaehyeon Kim

Reputation: 1417

Below is another trial just using l(s)apply().

t(sapply(unique(tt$person), function(x) lapply(tt[tt$person==x,], max)))
     person col1 col2
[1,] 1      1    1   
[2,] 2      1    0   
[3,] 3      0    1  

Upvotes: 0

MrFlick
MrFlick

Reputation: 206456

Use the summarise_each function from dplyr

library(dplyr)
tt %>% group_by(person) %>% summarise_each(funs(max))

#   person col1 col2
# 1      1    1    1
# 2      2    1    0
# 3      3    0    1

or just the base aggregate function

aggregate(.~person, tt, max)

#   person col1 col2
# 1      1    1    1
# 2      2    1    0
# 3      3    0    1

Upvotes: 5

Related Questions