Reputation: 87
I have a large dataframe with several columns that need to be additively merged based on the first part of a string (before .S*)...
an example data frame of this can be generated with this code
DF1 = structure(list(taxonomy = c("cat", "dog","horse","mouse","frog", "lion"),
A = c(0L, 5L, 3L, 0L, 0L, 0L), D = c(2L, 1L, 0L, 0L, 2L, 0L), C = c(0L, 0L, 0L, 4L, 4L, 2L)),
.Names = c("taxonomy", "A.S595", "B.S596", "B.S487"),
row.names = c(NA, -6L), class = "data.frame")
This file looks like this:
taxonomy A.S595 B.S596 B.S487
1 cat 0 2 0
2 dog 5 1 0
3 horse 3 0 0
4 mouse 0 0 4
5 frog 0 2 4
6 lion 0 0 2
and I would like the output to look like this
taxonomy A B
1 cat 0 2
2 dog 5 1
3 horse 3 0
4 mouse 0 4
5 frog 0 6
6 lion 0 2
Upvotes: 0
Views: 79
Reputation: 4505
Another version using tidyverse
:
DF1 %>%
select(matches("^B\\.S.*")) %>%
rowSums %>%
bind_cols(
select(DF1, -matches("^B\\.S.*")),
B = .
) %>%
rename_at(vars(matches("\\.S[0-9]+")), funs(gsub("\\.S[0-9]+", "", .)))
taxonomy A B
1 cat 0 2
2 dog 5 1
3 horse 3 0
4 mouse 0 4
5 frog 0 6
6 lion 0 2
Upvotes: 0
Reputation: 887501
One option would be to split
the dataset based on the names of the integer columns, loop through the list
, get the rowSums
and cbind
with the first column
cbind(DF1[1], sapply(split.default(DF1[-1], substr(names(DF1)[-1], 1, 1)), rowSums))
# taxonomy A B
#1 cat 0 2
#2 dog 5 1
#3 horse 3 0
#4 mouse 0 4
#5 frog 0 6
#6 lion 0 2
Or using tidyverse
library(tidyverse)
rownames_to_column(DF1) %>%
gather(key, val, -taxonomy, -rowname) %>%
separate(key, into = c('key1', 'key2')) %>%
group_by(rowname, key1) %>%
summarise(val = sum(val)) %>%
spread(key1, val) %>%
ungroup %>%
select(-rowname) %>%
bind_cols(DF1[1], .)
Upvotes: 2