Writing for loop in r to combine columns that has matching names (with little variance)

Question

I have a data frame where column names are duplicated once. Now I need to combine them to get a proper data set. I can use dplyr select command to extract matching columns and combine them later. However, I wish to achieve it using for loop.

#Example data frame

x <- c(1, NA, 3)
y <- c(1, NA, 4)
x.1 <- c(NA, 3, NA)
y.1 <- c(NA, 5, NA)

data <- data.frame(x, y, x1, y1)

##with `dplyr` I can do like 

t1 <- data%>%select(contains("x"))%>%
mutate(x = rowSums(., na.rm = TRUE))%>%
select(x)
t2 <- data%>%select(contains("y"))%>%
mutate(y = rowSums(., na.rm = TRUE))%>%
select(y)

data <- cbind(t1,t2)

This is cumbersome as I have more than 25 similar columns

How to achieve the same result using for loop by matching columns names and perform rowSums. Or even simple approach using dplyr will also help.

akrun · Accepted Answer

We can use split.default to split based on the substring of the column names into a list and then apply the rowSums

library(dplyr)
library(stringr)
library(purrr)
data %>%
    split.default(str_remove(names(.), "\.\d+")) %>%
    map_dfr(rowSums, na.rm = TRUE)
# A tibble: 3 x 2
#      x     y
#   
#1     1     1
#2     3     5
#3     3     4

If we want to use a for loop

un1 <- unique(sub("\..*", "", names(data)))
out <- setNames(rep(list(NA), length(un1)), un1)
for(un in un1) {
     out[[un]] <- rowSums(data[grep(un, names(data))], na.rm = TRUE)
 }
as.data.frame(out)

data

data <- structure(list(x = c(1, NA, 3), y = c(1, NA, 4), x.1 = c(NA, 
3, NA), y.1 = c(NA, 5, NA)), class = "data.frame", row.names = c(NA, 
-3L))

Writing for loop in r to combine columns that has matching names (with little variance)

Answers (2)

data

Related Questions