Monica Heddneck
Monica Heddneck

Reputation: 91

combining values in rows based on matching conditions in R

I have a simple question about aggregating values in R.

Suppose I have a dataframe:

DF <- data.frame(col1=c("Type 1", "Type 1B", "Type 2"), col2=c(1, 2, 3))  

which looks like this:

     col1 col2
1  Type 1    1
2 Type 1B    2
3  Type 2    3

I notice that I have Type 1 and Type 1B in the data, so I would like to combine Type 1B into Type 1.

So I decide to use dplyr:

filter(DF, col1=='Type 1' | col1=='Type 1B') %>%
  summarise(n = sum(col2))

But now I need to keep going with it:

DF2 <- data.frame('Type 1', filter(DF, col1=='Type 1' | col1=='Type 1B') %>%
  summarise(n = sum(col2)))

I guess I want to cbind this new DF2 back to the original DF, but that means I have to set the column names to be consistent:

names(DF2) <- c('col1', 'col2')

OK, now I can rbind:

rbind(DF2, DF[3,])

The result? It worked....

   col1 col2
1 Type 1    3
3 Type 2    3

...but ugh! That was awful! There has to be a better way to simply combine values.

Upvotes: 9

Views: 3633

Answers (4)

Rich Scriven
Rich Scriven

Reputation: 99331

Using sub() with aggregate(), removing anything other than a digit from the end of col1,

do.call("data.frame", 
    aggregate(col2 ~ cbind(col1 = sub("\\D+$", "", col1)), DF, sum)
)
#     col1 col2
# 1 Type 1    3
# 2 Type 2    3

The do.call() wrapper is there so that the first column after aggregate() is properly changed from a matrix to a vector. This way there aren't any surprises later on down the road.

Upvotes: 4

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

You can try:

library(data.table)

setDT(transform(DF, col1=gsub("(.*)[A-Z]+$","\\1",DF$col1)))[,list(col2=sum(col2)),col1]

#      col1 col2
# 1: Type 1    3
# 2: Type 2    3

Or even more directly:

setDT(DF)[, .(col2 = sum(col2)), by = .(col1 = sub("[[:alpha:]]+$", "", col1))]

Upvotes: 2

bgoldst
bgoldst

Reputation: 35314

In my opinion, aggregate() is the perfect function for this purpose, but you shouldn't have to do any text processing (e.g. gsub()). I would do this in a two-step process:

  1. Overwrite col1 with the new desired grouping.
  2. Compute the aggregation using the new col1 to specify the grouping.

DF$col1 <- ifelse(DF$col1 %in% c('Type 1','Type 1B'),'Type 1',levels(DF$col1));
DF;
##     col1 col2
## 1 Type 1    1
## 2 Type 1    2
## 3 Type 2    3
DF <- aggregate(col2~col1, DF, FUN=sum );
DF;
##     col1 col2
## 1 Type 1    3
## 2 Type 2    3

Upvotes: 2

talat
talat

Reputation: 70256

Here's a possible dplyr approach:

library(dplyr)
DF %>%
  group_by(col1 = sub("(.*\\d+).*$", "\\1", col1)) %>%
  summarise(col2 = sum(col2))
#Source: local data frame [2 x 2]
#
#    col1 col2
#1 Type 1    3
#2 Type 2    3

Upvotes: 5

Related Questions