Reputation: 125

ID variable based on combination of variables

I have a data frame that is arranged as follows:

df <- structure(list(NAME1=  c("AAA","CCC","BBB","BBB"), 
                 NAME2    =  c("BBB", "AAA","DDD","AAA"),
                 AMT      = c(40,20,10,50)),.Names=c("NAME1","NAME2","AMT"), 
                 row.names = c("1", "2", "3", "4"), class =("data.frame"))

I would like to create an ID variable as a combination of character variables NAME1 and NAME2 regardless of the order (i.e. AAA BBB is the same as BBB AAA) and sum up AMT.

This is what I would like to end up with:

df <- structure(list(NAME1 =  c("AAA","CCC", "BBB"), 
                 NAME2     =  c("BBB", "AAA","DDD"),
                 AMT       =  c(90,20,10),
                 ID        =  c(1,2,3)),
                 .Names    =  c("NAME1","NAME2","AMT","ID"), 
                 row.names =  c("1", "2", "3"), class =("data.frame"))

Your inputs would be much appreciated.

Upvotes: 1

Answers (2)

CPak

Reputation: 13581

Using dplyr verbs:

df %>% 
   rowwise() %>% 
   mutate(PAIR=paste0(sort(c(NAME1,NAME2)),collapse=" ")) %>% 
   group_by(PAIR) %>% 
   summarise(AMT=sum(AMT)) %>%
   mutate(ID=row_number()) %>%
   separate(PAIR, " ", into=c("NAME1","NAME2"))

  NAME1 NAME2   AMT    ID
1   AAA   BBB    90     1
2   AAA   CCC    20     2
3   BBB   DDD    10     3

Upvotes: 0

Mike H.

Reputation: 14360

You could create two new grouping variables that order the values across rows so that AAA, BBB and BBB, AAA are treated the same (because they get put in the same order). Afterwards, grouping operations are simple enough. I chose to use data.table:

library(data.table)

df[,c("NAME1_o","NAME2_o")] <- t(apply(cbind(df$NAME1, df$NAME2), 1, function(x) x[order(x)]))
setDT(df)[, .(AMT = sum(AMT), ID = .GRP), by = .(NAME1_o, NAME2_o)]

#   NAME1_o NAME2_o AMT ID
#1:     AAA     BBB  90  1
#2:     AAA     CCC  20  2
#3:     BBB     DDD  10  3

Upvotes: 2

ID variable based on combination of variables

Answers (2)

Related Questions