Reputation: 109
This seem obvious but I can't figure it out. I have a vector of characters containing state names alongside random other words and would like to extract the state name.
df <- data.frame(string = c("The quick brown Arizona","jumps over the Alabama","dog Arkansas"))
I can create extract state names individually:
df$state[grepl("Alabama",df$string)] <- "Alabama"
but I can't figure out how to replicate that for all states without copying and pasting it 42 times. The closest I got was:
find.state <- function(x){
df$state[grepl(x,df$string)] <- x
}
lapply(state.name, find.state)
but that just prints all the state names.
Upvotes: 1
Views: 1869
Reputation: 3570
R comes with a variable holding the state names, state.name
. Use paste
to collapse it into one long character element, with |
separating each state. This can be used as the search pattern for a regular expression.
library(stringr)
str_extract(df$string, paste(state.name, collapse='|'))
Upvotes: 3
Reputation: 20095
One option in the sample data provided by OP can be as:
gsub(".*\\s(\\w+)$","\\1",df$string)
#[1] "Arizona" "Alabama" "Arkansas"
Regex:
.*\s - Look for anything followed by `space`
(\\w+)$ - Look for word character following last space till end. This will be state name.
Upvotes: 0
Reputation: 37661
You can do this with a somewhat awkward regular expression.
df$state = sub(".*\\b(Arizona|Alabama|Arkansas)\\b.*", "\\1", df$string)
df
string state
1 The quick brown Arizona Arizona
2 jumps over the Alabama Alabama
3 dog Arkansas Arkansas
Of course, you need to include the names of all the states, not just these three. So you might build that as a pattern first.
Pattern = paste0(paste0(".*\\b(", paste0(state.name, collapse="|")), ")\\b.*")
df$state = sub(Pattern, "\\1", df$string)
Upvotes: 6