Reputation: 125
I have a data frame that is arranged as follows:
df <- structure(list(NAME1= c("AAA","CCC","BBB","BBB"),
NAME2 = c("BBB", "AAA","DDD","AAA"),
AMT = c(40,20,10,50)),.Names=c("NAME1","NAME2","AMT"),
row.names = c("1", "2", "3", "4"), class =("data.frame"))
I would like to create an ID variable as a combination of character variables NAME1 and NAME2 regardless of the order (i.e. AAA BBB is the same as BBB AAA) and sum up AMT.
This is what I would like to end up with:
df <- structure(list(NAME1 = c("AAA","CCC", "BBB"),
NAME2 = c("BBB", "AAA","DDD"),
AMT = c(90,20,10),
ID = c(1,2,3)),
.Names = c("NAME1","NAME2","AMT","ID"),
row.names = c("1", "2", "3"), class =("data.frame"))
Your inputs would be much appreciated.
Upvotes: 1
Views: 366
Reputation: 13581
Using dplyr
verbs:
df %>%
rowwise() %>%
mutate(PAIR=paste0(sort(c(NAME1,NAME2)),collapse=" ")) %>%
group_by(PAIR) %>%
summarise(AMT=sum(AMT)) %>%
mutate(ID=row_number()) %>%
separate(PAIR, " ", into=c("NAME1","NAME2"))
NAME1 NAME2 AMT ID
1 AAA BBB 90 1
2 AAA CCC 20 2
3 BBB DDD 10 3
Upvotes: 0
Reputation: 14360
You could create two new grouping variables that order the values across rows so that AAA, BBB
and BBB, AAA
are treated the same (because they get put in the same order). Afterwards, grouping operations are simple enough. I chose to use data.table
:
library(data.table)
df[,c("NAME1_o","NAME2_o")] <- t(apply(cbind(df$NAME1, df$NAME2), 1, function(x) x[order(x)]))
setDT(df)[, .(AMT = sum(AMT), ID = .GRP), by = .(NAME1_o, NAME2_o)]
# NAME1_o NAME2_o AMT ID
#1: AAA BBB 90 1
#2: AAA CCC 20 2
#3: BBB DDD 10 3
Upvotes: 2