Reputation: 420
Say I have the following dataframe (the real one is 10 labelx columns):
id <- c(1,2,3,4,5,6,7,8)
label1 <- c("apple","shoe","banana","hat","dog","radio","tree","pie")
label2 <- c("apple","sneaker","fruit","beanie","pet","ipod","doug fir","pie")
df <- data.frame(id,label1,label2)
And I would like to replace all items in the label columns with a word that categorizes it.
food <- c("apple","banana","pie","fruit")
clothing <- c("shoe","hat","beanie")
entertainment <- c("radio","ipod","mp3 player","phone")
forest <- c("tree","doug fir","redwood","forest")
I've tried something like the following:
column_list <- c("label1","label2")
new_df <- df
for(i in 1:2) {
new_df <- new_df %>%
mutate(parse(text=column_list[i-1]) = replace(parse(text=column_list[i-1]),
(parse(text=column_list[i-1]) %in% food),
"food"))
}
I don't have to do it this way, easier is fine too. Tidyverse preferred. How do I replace multiple values among multiple columns in R dataframe?
Upvotes: 4
Views: 4617
Reputation: 3700
The tidyverse has evolved and this can be solved much more elegantly now.
library("tidyverse")
df <- data.frame(
label1 = c("apple", "shoe", "banana", "hat", "dog", "radio", "tree", "pie"),
label2 = c("apple", "sneaker", "fruit", "beanie", "pet", "ipod", "doug fir", "pie")
)
labels <- list(
food = c("apple", "banana", "pie", "fruit"),
clothing = c("shoe", "hat", "beanie"),
entertainment = c("radio", "ipod", "mp3 player", "phone"),
forest = c("tree", "doug fir", "redwood", "forest")
)
item_to_label <- labels %>%
stack() %>%
deframe()
df %>%
mutate(
across(
c(label1, label2),
~ item_to_label[.]
)
)
#> label1 label2
#> 1 food food
#> 2 clothing <NA>
#> 3 food food
#> 4 clothing clothing
#> 5 <NA> <NA>
#> 6 entertainment entertainment
#> 7 forest forest
#> 8 food food
Created on 2022-03-16 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 32558
Here's an approach using base R. The idea is to create a named vector where the names are individual things (apple
, shoe
, etc.) and the values are the categories (food
, clothing
, etc.). Then it's a matter of extracting categories directly using the names.
obj = c("food", "clothing", "entertainment", "forest")
mylist = mget(obj)
mylist = lapply(obj, function(x){
temp = mylist[[x]]
setNames(rep(x, length(temp)), temp)
})
mylist = unlist(mylist)
df[-1] = lapply(df[-1], function(x) as.vector(mylist[as.character(x)]))
df
# id label1 label2
#1 1 food food
#2 2 clothing <NA>
#3 3 food food
#4 4 clothing clothing
#5 5 <NA> <NA>
#6 6 entertainment entertainment
#7 7 forest forest
#8 8 food food
Upvotes: 3
Reputation: 40171
One possibility could be using mutate_at()
and then a nested ifelse()
:
df %>%
mutate_at(vars(contains("label")),
funs(ifelse(. %in% food, "food",
ifelse(. %in% clothing, "clothing",
ifelse(. %in% entertainment, "entertainment",
ifelse(. %in% forest, "forest", NA_character_))))))
id label1 label2
1 1 food food
2 2 clothing <NA>
3 3 food food
4 4 clothing clothing
5 5 <NA> <NA>
6 6 entertainment entertainment
7 7 forest forest
8 8 food food
With mutate_at()
, it selects the variables that has "label" in their name and then simply applies a nested ifelse()
given the conditions.
Upvotes: 5