AnilGoyal
AnilGoyal

Reputation: 26238

How to match a column across multiple columns and return matching col_name in a new column

Let's assume I have a dataset df wherein I want to match value of col X across values of multiple columns A to F and want to return the matching column name (else NA) in a new column.

dput

df <- structure(list(A = c(4L, NA, NA, NA), B = c(NA, 5L, NA, NA), 
    C = c(NA, NA, NA, NA), D = c(NA, 4L, 6L, 7L), E = c(5L, NA, 
    NA, NA), F = c(NA, NA, NA, NA), X = 4:7), class = "data.frame", row.names = c(NA, 
-4L))

> df
   A  B  C  D  E  F X
1  4 NA NA NA  5 NA 4
2 NA  5 NA  4 NA NA 5
3 NA NA NA  6 NA NA 6
4 NA NA NA  7 NA NA 7

The output I want

> df_out
   A  B  C  D  E  F X new
1  4 NA NA NA  5 NA 4   A
2 NA  5 NA  4 NA NA 5   B
3 NA NA NA  6 NA NA 6   D
4 NA NA NA  7 NA NA 7   D

I will prefer dplyr/tidyverse syntax that I'll integrate into my existing syntax.

Upvotes: 2

Views: 977

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389135

Compare A:F column with X column, replace NA's with FALSE and use max.col to get the index of TRUE value in each row which can be used to get column name.

library(dplyr)

df %>%
  mutate(new = {
    tmp <- select(., A:F) == X
    names(.)[max.col(replace(tmp, is.na(tmp), FALSE))]
  })

#   A  B  C  D  E  F X new
#1  4 NA NA NA  5 NA 4   A
#2 NA  5 NA  4 NA NA 5   B
#3 NA NA NA  6 NA NA 6   D
#4 NA NA NA  7 NA NA 7   D

In base R this can be written as :

tmp <- df[1:5] == df$X
df$new <- names(df)[max.col(replace(tmp, is.na(tmp), FALSE))]

This solution assumes that you'll have at least one match of X in the row as shown in the example.

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 40051

One option could be:

df %>%
 rowwise() %>%
 mutate(new = names(.)[which(c_across(-X) %in% X)])

      A     B C         D     E F         X new  
  <int> <int> <lgl> <int> <int> <lgl> <int> <chr>
1     4    NA NA       NA     5 NA        4 A    
2    NA     5 NA        4    NA NA        5 B    
3    NA    NA NA        6    NA NA        6 D    
4    NA    NA NA        7    NA NA        7 D

The solution above assumes that the column names correspond to the positions established by which(). However, if it is not the case (e.g. c_across(-c(C, E, X)), then the results won't be correct. A solution for more complex situations could be:

df %>%
 mutate(new = Reduce(coalesce, across(-c(C, E, X), ~ ifelse(. == X, cur_column(), NA_character_))))
                          
   A  B  C  D  E  F X new
1  4 NA NA NA  5 NA 4   A
2 NA  5 NA  4 NA NA 5   B
3 NA NA NA  6 NA NA 6   D
4 NA NA NA  7 NA NA 7   D

Upvotes: 4

Jakub.Novotny
Jakub.Novotny

Reputation: 3047

df %>%
  pivot_longer(cols = -X) %>%
  mutate(
    match = if_else(X == value, name, NA_character_)
  ) %>%
  pivot_wider() %>%
  filter(!is.na(match))

Upvotes: 2

Related Questions