Reputation: 86
I have a sample data such as:
current dataframe:
Person <- c("John","Jacob","Jill","Joan")
Fruits <- c("Apples","Apples,Oranges","Bananas","Oranges,Bananas")
df <- as.data.frame(cbind(Person,Fruits))
I am trying to identify if single fruit is contained in the string then put the name of the fruit on a separate column, if apple is listed with other fruits then "Apple & Other", or if there are multiple fruit (excluding apple) identify it as "Multiple" such that it appears as the following:
wanted output:
Person <- c("John","Jacob","Jill","Joan")
Fruits <- c("Apples","Apples,Oranges","Bananas","Oranges,Apples,Bananas")
Fruits2 <- c("Apples","Apples & Other","Bananas","Multiple")
df2 <- cbind(Person,Fruits)
df2 <- as.data.frame(cbind(df2,Fruits2))
I have tried using the following ifelse statement:
df$Fruits2 <- ifelse(grep("\\bApples\\b",df$Fruits),"Apples",
ifelse(grep(".Apples.|.Apples|Apples.",df$Fruits),"Apples & Other",
ifelse(grep("\\bOranges\\b",df$Fruits),"Oranges",
ifelse(grep(".Oranges.|.Oranges|Oranges.",df$Fruits),"Multiple",
ifelse(grep("\\bBananas\\b",df$Fruits),"Bananas",
ifelse(grep(".Bananas.|.Bananas|Bananas.",df$Fruits),"Multiple","TBD"))))))
However, the output of df$Fruits2 all becomes Output. Not sure if its the logic of the nested if statements, but if there is a better solution, any help is appreciated.
Upvotes: 0
Views: 40
Reputation: 31
You can use strsplit() to split on "," and use ifelse to verify the conditions and use ur required strings to save in a new column.
df$Fruits2 <- sapply(strsplit(df$Fruits,","),function(x){ifelse(length(x)==1,x[1], ifelse(length(x)>=2 & "Apples" %in% x, "Apples & Other","Multiple"))})
df
Person Fruits Fruits2
1 John Apples Apples
2 Jacob Apples,Oranges Apples & Other
3 Jill Bananas Bananas
4 Joan Oranges,Bananas Multiple
Upvotes: 0
Reputation: 215067
This if-else might be more concise for your logic, generally you go from most specific cases to more general cases, besides you will need grepl
which returns logic values instead of grep
which returns either integers or values as in the original vector:
library(dplyr)
df %>% mutate(Fruits2 = ifelse(grepl(",", Fruits),
ifelse(grepl("Apples", Fruits), "Apples & Other", "Multiple"),
Fruits))
# Person Fruits Fruits2
# 1 John Apples Apples
# 2 Jacob Apples,Oranges Apples & Other
# 3 Jill Bananas Bananas
# 4 Joan Oranges,Bananas Multiple
Upvotes: 1