s.shi
s.shi

Reputation: 87

How to additively merge columns in a dataframe with similar column names?

I have a large dataframe with several columns that need to be additively merged based on the first part of a string (before .S*)...

an example data frame of this can be generated with this code

DF1 = structure(list(taxonomy = c("cat", "dog","horse","mouse","frog", "lion"),
                 A = c(0L, 5L, 3L, 0L, 0L, 0L), D = c(2L, 1L, 0L, 0L, 2L, 0L), C = c(0L, 0L, 0L, 4L, 4L, 2L)), 
            .Names = c("taxonomy", "A.S595", "B.S596", "B.S487"), 
            row.names = c(NA, -6L), class = "data.frame")

This file looks like this:

  taxonomy A.S595 B.S596 B.S487
1      cat 0      2      0
2      dog 5      1      0
3    horse 3      0      0
4    mouse 0      0      4
5     frog 0      2      4
6     lion 0      0      2

and I would like the output to look like this

  taxonomy A      B 
1      cat 0      2      
2      dog 5      1      
3    horse 3      0      
4    mouse 0      4      
5     frog 0      6      
6     lion 0      2  

Upvotes: 0

Views: 79

Answers (2)

utubun
utubun

Reputation: 4505

Another version using tidyverse:

DF1 %>%
  select(matches("^B\\.S.*")) %>%
  rowSums %>%
  bind_cols(
    select(DF1, -matches("^B\\.S.*")),
    B = .
    ) %>%
  rename_at(vars(matches("\\.S[0-9]+")), funs(gsub("\\.S[0-9]+", "", .)))

  taxonomy A B
1      cat 0 2
2      dog 5 1
3    horse 3 0
4    mouse 0 4
5     frog 0 6
6     lion 0 2

Upvotes: 0

akrun
akrun

Reputation: 887501

One option would be to split the dataset based on the names of the integer columns, loop through the list, get the rowSums and cbind with the first column

cbind(DF1[1], sapply(split.default(DF1[-1], substr(names(DF1)[-1], 1, 1)), rowSums))
#  taxonomy A B
#1      cat 0 2
#2      dog 5 1
#3    horse 3 0
#4    mouse 0 4
#5     frog 0 6
#6     lion 0 2

Or using tidyverse

library(tidyverse)
rownames_to_column(DF1) %>% 
   gather(key, val, -taxonomy, -rowname) %>%
   separate(key, into = c('key1', 'key2')) %>% 
   group_by(rowname, key1) %>% 
   summarise(val = sum(val)) %>% 
   spread(key1, val)  %>% 
   ungroup %>% 
   select(-rowname) %>% 
   bind_cols(DF1[1], .)

Upvotes: 2

Related Questions