screechOwl
screechOwl

Reputation: 28139

Remove last word from string

I'm trying to do something but can't remember/find the answer. I have a list of city names from the Census Bureau and they put the city's type on the end which is messing up my match().

I'd like to make this:

Middletown Township
Sunny Valley Borough
Hillside Village

into this:

Middletown
Sunny Valley
Hillside

Any suggestions? Ideally I'd also like to know if there's a lastIndexOf() function in R.

Here's the data:

df1 <- data.frame(
  id = c(1, 2, 3),
  city = factor(c("Middletown Township", "Sunny Valley Borough", "Hillside Village"))
)

Upvotes: 18

Views: 17454

Answers (3)

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

This will work:

gsub("\\s+\\w*$", "", df1$city)
[1] "Middletown"   "Sunny Valley" "Hillside"   

It removes any substring consisting of one or more space chararacters, followed by any number of "word" characters (spaces, numbers, or underscores), followed by the end of the string.

Upvotes: 22

Brendan
Brendan

Reputation: 216

I would use word() in the stringr package like so:

df1 %>% mutate(city = word(city , 1  , -2))

The first argument (1) indicates that you're starting from the first word, and the second (-2) indicates that you're keeping everything up to the second last word.

Upvotes: 8

Tyler
Tyler

Reputation: 10032

Here's a regexp that does what you need:

sub(df1$city, pattern = " [[:alpha:]]*$", replacement = "")

[1] "Middletown" "Sunny Valley" "Hillside"

That's replacing a substring that starts with a space, then contains only letters until the end of the string, with an empty string.

Upvotes: 19

Related Questions