samuelt
samuelt

Reputation: 215

assign unique ID from two columns where values can be in inverse order on different rows

I have a df where I'd like to create a unique ID for a combination of unique ids that are in two columns but in different rows and show in inverse order. How could one accomplish this? I appreciate any help.

df:

>     CALL_NO    DUP_OF_CALL_NO
>     171573729  171573731
>     171573729  171573731
>     171630085  171630084
>     171630085  171630084
>     171573731  171573729
>     171573731  171573729
>     171630084  171630085  
>     171630084  171630085  
>     171573731  171573729
>     171573731  171573729

desired output:

>     ID  CALL_NO    DUP_OF_CALL_NO
>     1   171573729  171573731
>     1   171573729  171573731
>     2   171630085  171630084
>     2   171630085  171630084
>     1   171573731  171573729
>     1   171573731  171573729
>     2   171630084  171630085  
>     2   171630084  171630085  
>     1   171573731  171573729
>     1   171573731  171573729

Upvotes: 0

Views: 151

Answers (3)

samuelt
samuelt

Reputation: 215

#first standardize order of CALL_NO and DUP_OF_CALL_NO to create groupings
df = df %>% 
            mutate( call1 = pmin(CALL_NO, DUP_OF_CALL_NO),
                     call2 = pmax(CALL_NO, DUP_OF_CALL_NO)) 


###Create unique ID across related CALL_NO and DUP_OF_CALL_NO
#convert to data.table    
df<- data.table(df)

#apply ID to associate group of calls 
df[, ID := .GRP, by=.(call1, call2)]

#convert back to data.frame
class(as.data.frame(df))

Upvotes: 0

DanY
DanY

Reputation: 6073

With data.table:

Create example dataset

set.seed(1234)
df <- data.frame(
    call1 = sample(1:10, 100, T),
    call2 = sample(1:10, 100, T),
    stringsAsFactors = F
)

# load data.table package and convert df to a data.table
library(data.table)
setDT(df)

Add group indicator:

# order df so group numbers are increasing by (call1, call2)
df <- df[order(call1, call2)]

# add group id
df[, id := .GRP, by=.(call1, call2)]

Upvotes: 0

Onyambu
Onyambu

Reputation: 79288

df$ID=as.numeric(factor(apply(df,1,function(x)toString(sort(x)))))
> df
     CALL_NO DUP_OF_CALL_NO ID
1  171573729      171573731  1
2  171573729      171573731  1
3  171630085      171630084  2
4  171630085      171630084  2
5  171573731      171573729  1
6  171573731      171573729  1
7  171630084      171630085  2
8  171630084      171630085  2
9  171573731      171573729  1
10 171573731      171573729  1

or you can also use do.call:

as.numeric(factor(do.call(function(x,y)paste(pmin(x,y),pmax(x,y)),unname(df))))
[1] 1 1 2 2 1 1 2 2 1 1

Upvotes: 1

Related Questions