alex
alex

Reputation: 183

How to create a new variable based on the values in two columns

I want to add a new column to a dataframe based on the condition of two columns.

I have the following data:

Animal.1 <- c("A", "B", "C", "B", "A" )
Animal.2 <- c("B", "A", "A", "C", "C")
df <- data.frame(Animal.1, Animal.2)

If the following conditions are met:

Animal.1 = A and Animal.2 = B OR Animal.1 = B and Animal.2 = A

I would like the new column called pair.code to equal 1.

I would like a different number for every pair of animal ids, but the same number to be used if the same animal id's are found in either Animal.1 and Animal.2 OR Animal.2 and Animal.1.

The final data should look like this:

Animal.1 <- c("A", "B", "C", "B", "A" )
Animal.2 <- c("B", "A", "A", "C", "C")
pair.code <- c("1", "1", "2", "3", "2")


df <- data.frame(Animal.1, Animal.2)

Upvotes: 1

Views: 206

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 173793

A solution using factor:

df$pair.code <- as.numeric(factor(apply(df, 1, function(x) paste0(sort(x), collapse=""))))

df
#>   Animal.1 Animal.2 pair.code
#> 1        A        B         1
#> 2        B        A         1
#> 3        C        A         2
#> 4        B        C         3
#> 5        A        C         2

Upvotes: 2

akrun
akrun

Reputation: 886938

We can first sort the elements by row and then create the 'pair.code' with match

m1 <- t(apply(df, 1, sort))
v1 <- paste(m1[,1], m1[,2])
df$pair.code <- match(v1, unique(v1))
df$pair.code
#[1] 1 1 2 3 2

Upvotes: 2

Related Questions