Reputation: 153
I have a data frame where column names are duplicated once. Now I need to combine them to get a proper data set. I can use dplyr
select command to extract matching columns and combine them later. However, I wish to achieve it using for
loop.
#Example data frame
x <- c(1, NA, 3)
y <- c(1, NA, 4)
x.1 <- c(NA, 3, NA)
y.1 <- c(NA, 5, NA)
data <- data.frame(x, y, x1, y1)
##with `dplyr` I can do like
t1 <- data%>%select(contains("x"))%>%
mutate(x = rowSums(., na.rm = TRUE))%>%
select(x)
t2 <- data%>%select(contains("y"))%>%
mutate(y = rowSums(., na.rm = TRUE))%>%
select(y)
data <- cbind(t1,t2)
This is cumbersome as I have more than 25 similar columns
How to achieve the same result using for loop by matching columns names and perform rowSums
. Or even simple approach using dplyr
will also help.
Upvotes: 1
Views: 49
Reputation: 887951
We can use split.default
to split based on the substring of the column names into a list
and then apply the rowSums
library(dplyr)
library(stringr)
library(purrr)
data %>%
split.default(str_remove(names(.), "\\.\\d+")) %>%
map_dfr(rowSums, na.rm = TRUE)
# A tibble: 3 x 2
# x y
# <dbl> <dbl>
#1 1 1
#2 3 5
#3 3 4
If we want to use a for
loop
un1 <- unique(sub("\\..*", "", names(data)))
out <- setNames(rep(list(NA), length(un1)), un1)
for(un in un1) {
out[[un]] <- rowSums(data[grep(un, names(data))], na.rm = TRUE)
}
as.data.frame(out)
data <- structure(list(x = c(1, NA, 3), y = c(1, NA, 4), x.1 = c(NA,
3, NA), y.1 = c(NA, 5, NA)), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 4
Reputation: 13135
Using purrr::map_dfc
and transmute
instead of mutate
library(dplyr)
purrr::map_dfc(c('x','y'), ~data %>% select(contains(.x)) %>%
transmute(!!.x := rowSums(., na.rm = TRUE)))
x y
1 1 1
2 3 5
3 3 4
Upvotes: 3