bml
bml

Reputation: 125

Group dataframe by columns containing specific value R

I have a dataframe which looks like the following:

Type    June 23   June 24   July 23    July 24    August 23    August 24
 A         8        2          20         20           30         25
 B         8        2          20         20           30         25

Now I want to have the sum over the columns per month (so disregarding the week number). So, the result should look like this

Type    June   July   August
 A       10     40      55        
 B       10     40      55

I try do this this using dplyr, but I do not know how to match only the first part of the column names. Does anyone have an idea how to do this?

Upvotes: 1

Views: 193

Answers (2)

Ric S
Ric S

Reputation: 9277

Another possible solution implying the tidyverse universe could be

library(stringr)
library(tidyr)
library(dplyr)

dat %>% 
  pivot_longer(cols = -Type) %>% 
  mutate(Month = str_extract(name, "\\D+")) %>% 
  group_by(Type, Month) %>% 
  summarise(Sum = sum(value)) %>% 
  pivot_wider(id_cols = Type, names_from = Month, values_from = Sum)

Output

# A tibble: 2 x 4
# Groups:   Type [2]
#   Type  August  July  June
#   <chr>  <int> <int> <int>
# 1 A         55    40    10
# 2 B         55    40    10

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389275

In base R, you can use split.default to split data based on similar part of column names.

In your example, if we remove numbers from column names we get the month name, pass it to split.default and sum each part using rowSums.

cbind(df[1],sapply(split.default(df[-1], sub('\\d+', '',names(df[-1]))), rowSums))

#  Type August July June
#1    A     55   40   10
#2    B     55   40   10

data

df <- structure(list(Type = c("A", "B"), June23 = c(8L, 8L), June24 = c(2L, 
2L), July23 = c(20L, 20L), July24 = c(20L, 20L), August23 = c(30L, 
30L), August24 = c(25L, 25L)), class = "data.frame", row.names = c(NA, -2L))

Upvotes: 2

Related Questions