Reputation: 85
I have a large dataset. In the dataset there are a bunch of names, but for reasons in how the data was entered I only need the names with one word in them. I was thinking of using grepl to grab any blank spaces in the words but I would also need to do this for "-". I need only observations with one word in this variable. So far
more_than_one_word <- mydata[grepl("\s", mydata$City) , ]
doesn't pick up anything like "Sussie James." What else can I do? Thanks.
Upvotes: 2
Views: 354
Reputation: 23818
You could try
only_one_word <- mydata[which(!grepl(" |-", mydata$City)), ]
Example:
cities <- c("Los Angeles", "New York", "Chicago", "Aix-en-Provence")
#> cities[which(!grepl(" |-",cities))]
#[1] "Chicago"
That's if you need to remove any entry with a hyphen, too.
#> cities[which(!grepl(" ",cities))]
#[1] "Chicago" "Aix-en-Provence"
Hope this helps.
Upvotes: 2
Reputation: 17432
I would take the approach of saying "Give me any string that's just letters!"
> vec = c(" ", "hi", "Chicago", "new york", "New_York")
> vec
[1] " " "hi" "Chicago" "new york" "New_York"
> grep("^[a-zA-Z]*$", vec)
[1] 2 3
This will accept any string that is just letters from the first character to the last.
Upvotes: 2