Reputation: 844
I have a dataset like this. I want to recognize all of the observations that have multiple values in the "color" columns and replace them with "multicolor"
ID color1 color2
23 red NA
44 blue purple
51 yellow NA
59 green orange
like this:
ID color
23 red
44 multicolor
51 yellow
59 multicolor
Any ideas would be much appreciated, thank you!
Upvotes: 3
Views: 48
Reputation: 21440
Here's what seems like a straightforward solution:
library(dplyr)
library(stringr)
data %>%
mutate(
# step 1 - paste `color1` and `color2` together and remove " NA":
color = gsub("\\sNA", "", paste(color1, color2)),
# step 2 - count the number of white space characters:
color = str_count(color, " "),
# step 3 - label `color` as "multicolor" where `color` != 0:
color = ifelse(color == 0, color1, "multicolor")) %>%
# remove the obsolete color columns:
select(-matches("\\d$"))
ID color
1 23 red
2 44 multicolor
3 51 yellow
4 59 multicolor
Data:
data <- data.frame(ID = c(23, 44, 51, 59),
color1 = c("red", "blue", "yellow", "green"),
color2 = c(NA, "purple", NA, "orange"))
Upvotes: 1
Reputation: 19271
A base R approach
# get colors from columns named color*
colo <- paste(names(table(unlist(df1[,grep("color",colnames(df1))]))), collapse="|")
colo
[1] "blue|green|red|yellow|orange|purple"
# match the colors and do the conversion
data.frame(
ID=df1$ID,
color=apply( df1, 1, function(x){
y=x[grep(colo, x)];
if(length(y)>1){y="multicolor"}; y } ) )
ID color
1 23 red
2 44 multicolor
3 51 yellow
4 59 multicolor
df1 <- structure(list(ID = c(23L, 44L, 51L, 59L), color1 = c("red",
"blue", "yellow", "green"), color2 = c(NA, "purple", NA, "orange"
)), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 0
Reputation: 317
You can do this, assuming data
is your dataset.
library(dplyr)
data <- data.frame(ID = c(23, 44, 51, 59),
color1 = c("red", "blue", "yellow", "green"),
color2 = c(NA, "purple", NA, "orange"))
data %>%
mutate(color = ifelse(is.na(color2), color1, "multicolor")) %>%
select(ID, color)
Upvotes: 2
Reputation:
Here is a way to do it in the tidyverse.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with("color"), values_to = "color", values_drop_na = TRUE) %>%
group_by(ID) %>%
summarize(n = n(),
color = toString(color), .groups = "drop") %>%
mutate(color = if_else(n > 1, "multicolor", color)) %>%
select(-n)
# # A tibble: 4 x 2
# ID color
# <int> <chr>
# 1 23 red
# 2 44 multicolor
# 3 51 yellow
# 4 59 multicolor
I did it this way on purpose. Note that if you stopped after the summarize()
line, you would get the actual colors.
# # A tibble: 4 x 3
# ID n color
# <int> <int> <chr>
# 1 23 1 red
# 2 44 2 blue, purple
# 3 51 1 yellow
# 4 59 2 green, orange
This would scale if you had many color columns, not just 2. Play around with it, there are a lot of ways to tweaks things like this.
df <- read.table(textConnection("ID color1 color2
23 red NA
44 blue purple
51 yellow NA
59 green orange"), header = TRUE)
Upvotes: 5