Leo
Leo

Reputation: 86

Identifying Multiple String in a Column then Outputting the Observation on a Separate Column; R v3.3.0

I have a sample data such as:

current dataframe:

Person <- c("John","Jacob","Jill","Joan") 
Fruits <- c("Apples","Apples,Oranges","Bananas","Oranges,Bananas")
df <- as.data.frame(cbind(Person,Fruits))

I am trying to identify if single fruit is contained in the string then put the name of the fruit on a separate column, if apple is listed with other fruits then "Apple & Other", or if there are multiple fruit (excluding apple) identify it as "Multiple" such that it appears as the following:

wanted output:

Person <- c("John","Jacob","Jill","Joan")
Fruits <- c("Apples","Apples,Oranges","Bananas","Oranges,Apples,Bananas")
Fruits2 <- c("Apples","Apples & Other","Bananas","Multiple")
df2 <- cbind(Person,Fruits)
df2 <- as.data.frame(cbind(df2,Fruits2))

I have tried using the following ifelse statement:

df$Fruits2 <- ifelse(grep("\\bApples\\b",df$Fruits),"Apples",
                 ifelse(grep(".Apples.|.Apples|Apples.",df$Fruits),"Apples & Other",
                        ifelse(grep("\\bOranges\\b",df$Fruits),"Oranges",
                               ifelse(grep(".Oranges.|.Oranges|Oranges.",df$Fruits),"Multiple",
                                      ifelse(grep("\\bBananas\\b",df$Fruits),"Bananas",
                                             ifelse(grep(".Bananas.|.Bananas|Bananas.",df$Fruits),"Multiple","TBD"))))))

However, the output of df$Fruits2 all becomes Output. Not sure if its the logic of the nested if statements, but if there is a better solution, any help is appreciated.

Upvotes: 0

Views: 40

Answers (2)

Sandesh
Sandesh

Reputation: 31

You can use strsplit() to split on "," and use ifelse to verify the conditions and use ur required strings to save in a new column.

  df$Fruits2 <- sapply(strsplit(df$Fruits,","),function(x){ifelse(length(x)==1,x[1], ifelse(length(x)>=2 & "Apples" %in% x, "Apples & Other","Multiple"))})

   df

   Person           Fruits         Fruits2
 1   John           Apples          Apples
 2  Jacob   Apples,Oranges  Apples & Other
 3   Jill          Bananas         Bananas
 4   Joan  Oranges,Bananas        Multiple

Upvotes: 0

akuiper
akuiper

Reputation: 215067

This if-else might be more concise for your logic, generally you go from most specific cases to more general cases, besides you will need grepl which returns logic values instead of grep which returns either integers or values as in the original vector:

library(dplyr)
df %>% mutate(Fruits2 = ifelse(grepl(",", Fruits), 
                        ifelse(grepl("Apples", Fruits), "Apples & Other", "Multiple"), 
                        Fruits))

#   Person          Fruits        Fruits2
# 1   John          Apples         Apples
# 2  Jacob  Apples,Oranges Apples & Other
# 3   Jill         Bananas        Bananas
# 4   Joan Oranges,Bananas       Multiple

Upvotes: 1

Related Questions