marek_sebo
marek_sebo

Reputation: 31

How to set priority when aggregating rows of a data frame?

I need to aggregate rows of a data frame with MANY rows in this manner:

Lets say I have two rows that I want to merge (as they have the same first element):

  x1=c(ID1,1,3,2,1,NA);
  x2=c(ID1,2,2,3,NA,NA);

I want to merge their values so, that the result will in the case of clash (clash=both rows have a non-NA value in the element) prefer the value in a row with higher value in 2nd element (that is x2).

So in the example the result of the aggregation of these particular two rows will be:

x12=c(ID1,2,2,3,1,NA)

Can anyone please help? I tried aggregate(), but it offers just summary statistics like mean, max etc.

Upvotes: 0

Views: 385

Answers (2)

Hack-R
Hack-R

Reputation: 23210

You can use SQL via the package sqldf.

Using sql you can group by ID and select the max of the values you mentioned.

require(data.table)
require(sqldf)
x1 <- c(1,1,2,1,NA)
x2 <- c(1,3,3,NA,NA)
x  <- data.table(rbind(x1,x2))
colnames(x) <- c("ID", "x1", "x2", "x3", "x4")

sqldf("select ID, max(x1) as x1, max(x2) as x2, max(x3) as x3, max(x4) as x4 from x group by ID")

  ID x1 x2 x3 x4
1  1  3  3  1 NA

Upvotes: 0

elevendollar
elevendollar

Reputation: 1204

You can use max as a function for the aggregate function.

Data:

x1=c('ID1',1,2,1,NA)
x2=c('ID1',3,3,NA,NA)

data <- data.frame(rbind(x1, x2), stringsAsFactors = FALSE)

Aggregating:

aggregate(data[,2:5], by = list(name = data$X1), FUN = max, na.rm = TRUE)

Result:

  name X2 X3 X4   X5
1  ID1  3  3  1 <NA>

Upvotes: 1

Related Questions