Erika
Erika

Reputation: 47

Matching values in two columns then based on that returning new value in R

I have these columns in a data frame that look like:

combination  color_1  color_2

1_1          red       red
1_2          red       blue
1_3          red       green
1_4          red       yellow
2_1          blue      red
2_2          blue      blue
2_3          blue      green
2_4          blue      yellow
... 

Based off matching the color_1 and color_2 values, I would like to be able to create new columns that outputs the result of the match. There are certain specifications to this. For the first row where "red" and "red" are the same, the output in the new column (e.g. "Red-Only") should be a "1", and then "2" for every other match. Then, I would repeat this code but then picking up on matches where "blue" and "blue" occur, to output "1" in a next column (e.g. "Blue-Only") and "2" everywhere else. This goes for Yellow-only matches, Green-only matches, etc. So at the end I would have 4 extra columns depending on the condition.

Thanks for the help in advance!

Upvotes: 2

Views: 1393

Answers (3)

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a way that doesn't depend on knowing the names of the colors.

fun <- function(color, DF, col1, col2){
  2L - (color == DF[[col1]] & color == DF[[col2]])
}

cols1 <- unique(df1$color_1)
cbind(df1, sapply(cols1, fun, df1, 'color_1', 'color_2'))
#  combination color_1 color_2 red blue
#1         1_1     red     red   1    2
#2         1_2     red    blue   2    2
#3         1_3     red   green   2    2
#4         1_4     red  yellow   2    2
#5         2_1    blue     red   2    2
#6         2_2    blue    blue   2    1
#7         2_3    blue   green   2    2
#8         2_4    blue  yellow   2    2

Data.

df1 <- read.table(text = "
combination  color_1  color_2
1_1          red       red
1_2          red       blue
1_3          red       green
1_4          red       yellow
2_1          blue      red
2_2          blue      blue
2_3          blue      green
2_4          blue      yellow
", header = TRUE, stringsAsFactors = FALSE)

Upvotes: 1

jdobres
jdobres

Reputation: 11957

Let's start with your existing data:

df <- structure(list(combination = c("1_1", "1_2", "1_3", "1_4", "2_1", 
"2_2", "2_3", "2_4"), color_1 = c("red", "red", "red", "red", 
"blue", "blue", "blue", "blue"), color_2 = c("red", "blue", "green", 
"yellow", "red", "blue", "green", "yellow")), class = "data.frame", row.names = c(NA, 
-8L))

  combination color_1 color_2
1         1_1     red     red
2         1_2     red    blue
3         1_3     red   green
4         1_4     red  yellow
5         2_1    blue     red
6         2_2    blue    blue
7         2_3    blue   green
8         2_4    blue  yellow

One solution would be to loop over your four color categories, checking for matches.

colors <- c('red', 'green', 'yellow', 'blue')

matches <- lapply(colors, function(x) {
  out <- ifelse(with(df, color_1 == color_2 & color_1 == x), 1, 2)
  out
})

And then naming the results of this operation with your intended column names.

names(matches) <- paste(colors, 'only', sep = '_')

And finally, gluing the results together with the original data:

df.new <- cbind(df, as.data.frame(matches))

  combination color_1 color_2 red_only green_only yellow_only blue_only
1         1_1     red     red        1          2           2         2
2         1_2     red    blue        2          2           2         2
3         1_3     red   green        2          2           2         2
4         1_4     red  yellow        2          2           2         2
5         2_1    blue     red        2          2           2         2
6         2_2    blue    blue        2          2           2         1
7         2_3    blue   green        2          2           2         2
8         2_4    blue  yellow        2          2           2         2

Upvotes: 3

MatthewR
MatthewR

Reputation: 2770

You can use ifelse. If you have a lot looping would be a good idea

cols <- data.frame(
color_1=c("Red","Red","Red","Red","Blue","Blue","Blue","Blue"),
color_2=c("Red","Blue","Green","Yellow","Red","Blue","Green","Yellow")
) 

cols$redonly <- ifelse( cols$color_1 %in% "Red" & cols$color_2 %in% "Red" , 1 ,2 )
cols$Blueonly <- ifelse( cols$color_1 %in% "Blue" & cols$color_2 %in% "Blue" , 1 ,2 )
cols$greeonly <- ifelse( cols$color_1 %in% "Green" & cols$color_2 %in% "Green" , 1 ,2 )

Upvotes: 2

Related Questions