Reputation: 420

How to replace multiple values among multiple columns in R dataframe?

Say I have the following dataframe (the real one is 10 labelx columns):

id <- c(1,2,3,4,5,6,7,8)
label1 <- c("apple","shoe","banana","hat","dog","radio","tree","pie")
label2 <- c("apple","sneaker","fruit","beanie","pet","ipod","doug fir","pie")
df <- data.frame(id,label1,label2)

And I would like to replace all items in the label columns with a word that categorizes it.

food <- c("apple","banana","pie","fruit")
clothing <- c("shoe","hat","beanie")
entertainment <- c("radio","ipod","mp3 player","phone")
forest <- c("tree","doug fir","redwood","forest")

I've tried something like the following:

column_list <- c("label1","label2")
new_df <- df

for(i in 1:2) {
  new_df <- new_df %>%
  mutate(parse(text=column_list[i-1]) = replace(parse(text=column_list[i-1]),
                      (parse(text=column_list[i-1]) %in% food),
                      "food"))
}

I don't have to do it this way, easier is fine too. Tidyverse preferred. How do I replace multiple values among multiple columns in R dataframe?

Upvotes: 4

Answers (3)

dipetkov

Reputation: 3700

The tidyverse has evolved and this can be solved much more elegantly now.

library("tidyverse")

df <- data.frame(
  label1 = c("apple", "shoe", "banana", "hat", "dog", "radio", "tree", "pie"),
  label2 = c("apple", "sneaker", "fruit", "beanie", "pet", "ipod", "doug fir", "pie")
)

labels <- list(
  food = c("apple", "banana", "pie", "fruit"),
  clothing = c("shoe", "hat", "beanie"),
  entertainment = c("radio", "ipod", "mp3 player", "phone"),
  forest = c("tree", "doug fir", "redwood", "forest")
)

item_to_label <- labels %>%
  stack() %>%
  deframe()

df %>%
  mutate(
    across(
      c(label1, label2),
      ~ item_to_label[.]
    )
  )
#>          label1        label2
#> 1          food          food
#> 2      clothing          <NA>
#> 3          food          food
#> 4      clothing      clothing
#> 5          <NA>          <NA>
#> 6 entertainment entertainment
#> 7        forest        forest
#> 8          food          food

^{Created on 2022-03-16 by the reprex package (v2.0.1)}

Upvotes: 1

d.b

Reputation: 32558

Here's an approach using base R. The idea is to create a named vector where the names are individual things (apple, shoe, etc.) and the values are the categories (food, clothing, etc.). Then it's a matter of extracting categories directly using the names.

obj = c("food", "clothing", "entertainment", "forest")
mylist = mget(obj)
mylist = lapply(obj, function(x){
    temp = mylist[[x]]
    setNames(rep(x, length(temp)), temp)
})
mylist = unlist(mylist)

df[-1] = lapply(df[-1], function(x) as.vector(mylist[as.character(x)]))
df
#  id        label1        label2
#1  1          food          food
#2  2      clothing          <NA>
#3  3          food          food
#4  4      clothing      clothing
#5  5          <NA>          <NA>
#6  6 entertainment entertainment
#7  7        forest        forest
#8  8          food          food

Upvotes: 3

tmfmnk

Reputation: 40171

One possibility could be using mutate_at() and then a nested ifelse():

df %>%
 mutate_at(vars(contains("label")), 
           funs(ifelse(. %in% food, "food", 
                       ifelse(. %in% clothing, "clothing",
                              ifelse(. %in% entertainment, "entertainment",
                                     ifelse(. %in% forest, "forest", NA_character_))))))


  id        label1        label2
1  1          food          food
2  2      clothing          <NA>
3  3          food          food
4  4      clothing      clothing
5  5          <NA>          <NA>
6  6 entertainment entertainment
7  7        forest        forest
8  8          food          food

With mutate_at(), it selects the variables that has "label" in their name and then simply applies a nested ifelse() given the conditions.

Upvotes: 5

How to replace multiple values among multiple columns in R dataframe?

Answers (3)

Related Questions