user1007587
user1007587

Reputation: 13

Aggregating rows across multiple values

I have a large dataframe with approximately this pattern:

Person Rate Street a b c d e f
A 2 XYZ 1 NULL 3 4 5 NULL
A 2 XYZ NULL 2 NULL NULL NULL NULL
A 3 XYZ NULL NULL NULL NULL NULL 6
B 2 DEF NULL NULL NULL NULL 5 NULL
B 2 DEF NULL 2 3 NULL NULL 6
C 1 DEF 1 2 3 4 5 6

A, b, c, d, e, f represents about 600 columns.

I am trying to combine the columns so that each person becomes one line, rows a-f combine into a single line using sum, and any conflicting rate or street information becomes a new row. So the data should look something like this:

Person Rate Rate 2 Street a b c d e f
A 2 3 XYZ 1 2 3 4 5 6
B 2 DEF NULL 2 3 NULL 5 6
C 1 DEF 1 2 3 4 5 6

I keep trying to make this work with aggregate and summarize but I'm not sure that's the right approach.

Thank you very much for your help!

Upvotes: 0

Views: 86

Answers (2)

user2974951
user2974951

Reputation: 10375

First we pivot all the unique rates per person and street.

library(reshape2)
tmp1=dcast(unique(df[,c("Person","Rate","Street")]),Person+Street~Rate,value.var="Rate")
colnames(tmp1)[-c(1:2)]=paste("Rate",colnames(tmp1)[-c(1:2)])

Then we aggregate and sum by person and rate, columns 4 to 9, from "a" to "f", change accordingly.

tmp2=aggregate(df[,4:9],list(Person=df$Person,Street=df$Street),function(x){
  ifelse(all(is.na(x)),NA,sum(x,na.rm=T))
})

And finally merge the two.

merge(tmp1,tmp2,by=c("Person","Street"))
  Person Street Rate 1 Rate 2 Rate 3  a b c  d e f
1      A    XYZ     NA      2      3  1 2 3  4 5 6
2      B    DEF     NA      2     NA NA 2 3 NA 5 6
3      C    DEF      1     NA     NA  1 2 3  4 5 6

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389135

Perhaps, you can do this in two-step process -

library(dplyr)
library(tidyr)

#sum columns a-f
table1 <- df %>%
  group_by(Person) %>%
  summarise(across(a:f, sum, na.rm = TRUE))


#Remove duplicated values and get the data in separate columns
#for Rate and Street columns.
table2 <- df %>%
  group_by(Person) %>%
  mutate(across(c(Rate, Street), ~replace(., duplicated(.), NA))) %>%
  select(Person, Rate, Street) %>%
  filter(if_any(c(Rate, Street), ~!is.na(.))) %>%
  mutate(col = row_number()) %>%
  ungroup %>%
  pivot_wider(names_from = col, values_from = c(Rate, Street)) %>%
  select(where(~any(!is.na(.))))

#Join the two data to get final result
inner_join(table1, table2, by = 'Person')

# Person     a     b     c     d     e     f Rate_1 Rate_2 Street_1
#  <chr>  <int> <int> <int> <int> <int> <int>  <int>  <int> <chr>   
#1 A          1     2     3     4     5     6      2      3 XYZ     
#2 B          0     2     3     0     5     6      2     NA DEF     
#3 C          1     2     3     4     5     6      1     NA DEF     

data

It is helpful and easier to help when you share data in a reproducible format which can be copied directly. I have used the below data for the answer.

df <- structure(list(Person = c("A", "A", "A", "B", "B", "C"), Rate = c(2L, 
2L, 3L, 2L, 2L, 1L), Street = c("XYZ", "XYZ", "XYZ", "DEF", "DEF", 
"DEF"), a = c(1L, NA, NA, NA, NA, 1L), b = c(NA, 2L, NA, NA, 
2L, 2L), c = c(3L, NA, NA, NA, 3L, 3L), d = c(4L, NA, NA, NA, 
NA, 4L), e = c(5L, NA, NA, 5L, NA, 5L), f = c(NA, NA, 6L, NA, 
6L, 6L)), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 0

Related Questions