Reputation: 1946
I have a database of thoroughbred names that is structured as follows:
HorseName <- c("Grey emperor", "Smokey grey", "Gaining greys", "chestnut", "Glowing Chestnuts", "Ruby red", "My fair lady", "Man of war")
Number <- seq(1:8)
df <- data.frame(HorseName, Number)
I now wish to search for occurences of colours within each horse's name. Specifically, I wish to select all the instances of 'grey' and 'chestnut', creating a new column that identifies these colours. Any other names can be simply 'other' Unfortunately, the names are not consistent, with plurals included and varying case formats. How would I go about doing this in R?
My anticipated output would be:
df$Type <- c("Grey", "Grey", "Grey", "Chestnut", "Chestnut", "Other", "Other", "Other")
I am familiar with chained ifelse statements but unsure how to handle the plural occurences and case sensitivities!
Upvotes: 1
Views: 510
Reputation: 4534
In case you are interested in other ways to do this, here's a tidyverse
alternative which has the same end result as @amrrs answer.
library(tidyverse)
library(stringr)
df %>%
mutate(Type = str_extract(str_to_lower(HorseName), "grey|chestnut")) %>%
mutate(Type = str_to_title(if_else(is.na(Type), "other", Type)))
#> HorseName Number Type
#> 1 Grey emperor 1 Grey
#> 2 Smokey grey 2 Grey
#> 3 Gaining greys 3 Grey
#> 4 chestnut 4 Chestnut
#> 5 Glowing Chestnuts 5 Chestnut
#> 6 Ruby red 6 Other
#> 7 My fair lady 7 Other
#> 8 Man of war 8 Other
Upvotes: 3
Reputation: 6325
Converting all the input text df$HorseName to lower case before pattern matching with grepl (using lower-cased pattern) solves this problem.
> df$Type <- ifelse(grepl('grey',tolower(df$HorseName)),'Grey',
+ ifelse(grepl('chestnut',tolower(df$HorseName)),'Chestnut',
+ 'others'))
> df
HorseName Number Type
1 Grey emperor 1 Grey
2 Smokey grey 2 Grey
3 Gaining greys 3 Grey
4 chestnut 4 Chestnut
5 Glowing Chestnuts 5 Chestnut
6 Ruby red 6 others
7 My fair lady 7 others
8 Man of war 8 others
>
Upvotes: 2