Saraha
Saraha

Reputation: 146

R - if column value matches vector item, take value from second vector

I have the following table:

library( tidyverse )
data = read.table(text="gene1
           gene2
           gene3", , sep="\t", col.names = c("Protein"))

And the following two lists:

genes = c("gene1", "gene3")
genes_names = c("name1", "name3")

Each item in gene_names corresponds to each item in genes with the same index.

Now, I want to make a new column in data called ToLabel, that holds the item in gene_names if the column value in data$Protein matches genes.

data %>% mutate( ToLabel = ifelse( Protein %in% genes, genes_names, "no" ) )

This does not work as expected. My expected outcome:

Protein ToLabel
gene1   name1
gene2   no
gene3   name3

Upvotes: 2

Views: 554

Answers (5)

ThomasIsCoding
ThomasIsCoding

Reputation: 101628

A base R option using merge + replace

transform(
  merge(
    transform(data, Protein = trimws(Protein)),
    data.frame(
      genes = c("gene1", "gene3"),
      genes_names = c("name1", "name3")
    ),
    by.x = "Protein",
    by.y = "genes",
    all.x = TRUE
  ),
  genes_names = replace(genes_names, is.na(genes_names), "no")
)

gives

  Protein genes_names
1   gene1       name1
2   gene2          no
3   gene3       name3

Upvotes: 2

Mohamed Desouky
Mohamed Desouky

Reputation: 4425

You can use use your code with some modifications

library( tidyverse )

data |> rowwise() |> mutate(Protein = trimws(c_across()) ,
ToLabel = ifelse( c_across() %in% genes, genes_names[which(c_across() == genes)],
"no" ) ) |> ungroup()

  • output
# A tibble: 3 × 2
  Protein ToLabel
  <chr>   <chr>  
1 gene1   name1  
2 gene2   no     
3 gene3   name3  

Upvotes: 1

objectclosure
objectclosure

Reputation: 58

You can use match():

ToLabel <- genes_names[match(trimws(data$Protein), genes)]
ToLabel[is.na(ToLabel)] <- "no"

data$ToLabel <- ToLabel
data
#>            Protein ToLabel
#> 1            gene1   name1
#> 2            gene2      no
#> 3            gene3   name3

Upvotes: 0

Onyambu
Onyambu

Reputation: 79238

Use recode:

data %>%
  mutate(Protein = str_squish(Protein),
    ToLabel = recode(Protein, !!!set_names(genes_names, genes), .default = 'no'))

  Protein ToLabel
1   gene1   name1
2   gene2      no
3   gene3   name3

Upvotes: 3

akrun
akrun

Reputation: 887173

Use a join if we want to replace multiple values by matching

library(dplyr)
data %>%
   mutate(Protein = trimws(Protein)) %>% 
   left_join(tibble(Protein = genes, ToLabel = genes_names)) %>%
   mutate(ToLabel = coalesce(ToLabel, "no"))

-output

  Protein ToLabel
1   gene1   name1
2   gene2      no
3   gene3   name3

Upvotes: 0

Related Questions