Assu
Assu

Reputation: 71

how do I search for columns with same name, add the column values and replace these columns with same name by their sum? Using R

I have a data frame where some consecutive columns have the same name. I need to search for these, add their values in for each row, drop one column and replace the other with their sum. without previously knowing which patterns are duplicated, possibly having to compare one column name with the following to see if there's a match.

Can someone help?

Thanks in advance.

Upvotes: 5

Views: 5487

Answers (4)

IRTFM
IRTFM

Reputation: 263362

> dfrm <- data.frame(a = 1:10, b= 1:10, cc= 1:10, dd=1:10, ee=1:10)
> names(dfrm) <- c("a", "a", "b", "b", "b")
> sapply(unique(names(dfrm)[duplicated(names(dfrm))]), 
      function(x) Reduce("+", dfrm[ , grep(x, names(dfrm))]) )
       a  b
 [1,]  2  3
 [2,]  4  6
 [3,]  6  9
 [4,]  8 12
 [5,] 10 15
 [6,] 12 18
 [7,] 14 21
 [8,] 16 24
 [9,] 18 27
[10,] 20 30

EDIT 2: Using rowSums allows simplification of the first sapply argumentto just unique(names(dfrm)) at the expense of needing to remember to include drop=FALSE in "[":

sapply(unique(names(dfrm)), 
       function(x) rowSums( dfrm[ , grep(x, names(dfrm)), drop=FALSE]) )

To deal with NA's:

sapply(unique(names(dfrm)), 
      function(x) apply(dfrm[grep(x, names(dfrm))], 1, 
              function(y) if ( all(is.na(y)) ) {NA} else { sum(y, na.rm=TRUE) }
       )               )

(Edit note: addressed Tommy counter-example by putting unique around the names(.)[.] construction. The erroneous code was:

sapply(names(dfrm)[unique(duplicated(names(dfrm)))], 
     function(x) Reduce("+", dfrm[ , grep(x, names(dfrm))]) )

Upvotes: 7

Ramnath
Ramnath

Reputation: 55695

Here is my one liner

# transpose data frame, sum by group = rowname, transpose back.
t(rowsum(t(dfrm), group = rownames(t(dfrm))))

Upvotes: 4

Richie Cotton
Richie Cotton

Reputation: 121077

Some sample data.

dfr <- data.frame(
  foo = rnorm(20),
  bar = 1:20,
  bar = runif(20),
  check.names = FALSE
)

Method: Loop over unique column names; if there is only one of that name, then selecting all columns with that nme will return a vector, but if there are duplicates it will also be a data frame. Use rowSums to sum over rows. (Duh. EDIT: Not quite as 'duh' as previously thought!) lapply returns a list, which we need to reform into a data frame, and finally we fix the names. EDIT: sapply avoids the need for the last step.

unique_col_names <- unique(colnames(dfr))
new_dfr <- sapply(unique_col_names, function(name)
{
  subs <- dfr[, colnames(dfr) == name]
  if(is.data.frame(subs))
    rowSums(subs)
  else
    subs
})

Upvotes: 2

Ista
Ista

Reputation: 10437

One way is to identify duplcates using (surprise) the duplicated function, and then loop through them to calculate the sums. Here is an example:

dat.dup <- data.frame(x=1:10, x=1:10, x=1:10, y=1:10, y=1:10, z=1:10, check.names=FALSE)
dups <- unique(names(dat.dup)[duplicated(names(dat.dup))])
for (i in dups) {
dat.dup[[i]] <- rowSums(dat.dup[names(dat.dup) == i])
}
dat <- dat.dup[!duplicated(names(dat.dup))]

Upvotes: 2

Related Questions