Reputation: 13
I have a large dataframe with approximately this pattern:
Person | Rate | Street | a | b | c | d | e | f |
---|---|---|---|---|---|---|---|---|
A | 2 | XYZ | 1 | NULL | 3 | 4 | 5 | NULL |
A | 2 | XYZ | NULL | 2 | NULL | NULL | NULL | NULL |
A | 3 | XYZ | NULL | NULL | NULL | NULL | NULL | 6 |
B | 2 | DEF | NULL | NULL | NULL | NULL | 5 | NULL |
B | 2 | DEF | NULL | 2 | 3 | NULL | NULL | 6 |
C | 1 | DEF | 1 | 2 | 3 | 4 | 5 | 6 |
A, b, c, d, e, f represents about 600 columns.
I am trying to combine the columns so that each person becomes one line, rows a-f combine into a single line using sum, and any conflicting rate or street information becomes a new row. So the data should look something like this:
Person | Rate | Rate 2 | Street | a | b | c | d | e | f |
---|---|---|---|---|---|---|---|---|---|
A | 2 | 3 | XYZ | 1 | 2 | 3 | 4 | 5 | 6 |
B | 2 | DEF | NULL | 2 | 3 | NULL | 5 | 6 | |
C | 1 | DEF | 1 | 2 | 3 | 4 | 5 | 6 |
I keep trying to make this work with aggregate and summarize but I'm not sure that's the right approach.
Thank you very much for your help!
Upvotes: 0
Views: 86
Reputation: 10375
First we pivot all the unique rates per person and street.
library(reshape2)
tmp1=dcast(unique(df[,c("Person","Rate","Street")]),Person+Street~Rate,value.var="Rate")
colnames(tmp1)[-c(1:2)]=paste("Rate",colnames(tmp1)[-c(1:2)])
Then we aggregate and sum by person and rate, columns 4 to 9, from "a" to "f", change accordingly.
tmp2=aggregate(df[,4:9],list(Person=df$Person,Street=df$Street),function(x){
ifelse(all(is.na(x)),NA,sum(x,na.rm=T))
})
And finally merge the two.
merge(tmp1,tmp2,by=c("Person","Street"))
Person Street Rate 1 Rate 2 Rate 3 a b c d e f
1 A XYZ NA 2 3 1 2 3 4 5 6
2 B DEF NA 2 NA NA 2 3 NA 5 6
3 C DEF 1 NA NA 1 2 3 4 5 6
Upvotes: 1
Reputation: 389135
Perhaps, you can do this in two-step process -
library(dplyr)
library(tidyr)
#sum columns a-f
table1 <- df %>%
group_by(Person) %>%
summarise(across(a:f, sum, na.rm = TRUE))
#Remove duplicated values and get the data in separate columns
#for Rate and Street columns.
table2 <- df %>%
group_by(Person) %>%
mutate(across(c(Rate, Street), ~replace(., duplicated(.), NA))) %>%
select(Person, Rate, Street) %>%
filter(if_any(c(Rate, Street), ~!is.na(.))) %>%
mutate(col = row_number()) %>%
ungroup %>%
pivot_wider(names_from = col, values_from = c(Rate, Street)) %>%
select(where(~any(!is.na(.))))
#Join the two data to get final result
inner_join(table1, table2, by = 'Person')
# Person a b c d e f Rate_1 Rate_2 Street_1
# <chr> <int> <int> <int> <int> <int> <int> <int> <int> <chr>
#1 A 1 2 3 4 5 6 2 3 XYZ
#2 B 0 2 3 0 5 6 2 NA DEF
#3 C 1 2 3 4 5 6 1 NA DEF
data
It is helpful and easier to help when you share data in a reproducible format which can be copied directly. I have used the below data for the answer.
df <- structure(list(Person = c("A", "A", "A", "B", "B", "C"), Rate = c(2L,
2L, 3L, 2L, 2L, 1L), Street = c("XYZ", "XYZ", "XYZ", "DEF", "DEF",
"DEF"), a = c(1L, NA, NA, NA, NA, 1L), b = c(NA, 2L, NA, NA,
2L, 2L), c = c(3L, NA, NA, NA, 3L, 3L), d = c(4L, NA, NA, NA,
NA, 4L), e = c(5L, NA, NA, 5L, NA, 5L), f = c(NA, NA, 6L, NA,
6L, 6L)), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 0