Reputation: 85
I have a dataset with 45 columns and >8000 observations. One of the variables in the columns is city-name. I want to remove all observations that are located in cities that begin with the letter "S". How would I do this? I'm pretty new to R, so sorry if this is simple, but I couldn't find any information through search.
Upvotes: 0
Views: 74
Reputation: 4042
You can use dplyr
's filter
function, although I have no idea how fast it is compared to other methods:
cities <- c("Some", "Random", "Cities", "Stack", "Overflow", "Bla", "Foo")
df <- data.frame(x = seq_along(cities), cities)
require(dplyr)
> df %>% filter(!grepl("^[Ss]", cities))
x cities
1 2 Random
2 3 Cities
3 5 Overflow
4 6 Bla
5 7 Foo
Upvotes: 1
Reputation: 263362
This will return only those rows beginning with a capital "S" using the substr()
-ing function:
dat[ substr( dat$City, 1 ,1) == "S" , ]
Could also have used:
dat[ grepl("^S", dat$City) , ]
The second option is a very simple regular expression. Look at ?regex and ?grep.
Upvotes: 1
Reputation: 7576
awk would be better for this. Something like
cat data | awk -F<delimiter> '{if (match($<1-indexed col num>, "^[^sS].*")) { print $0 }}'
You can do it in grep, but it get's sloppy (comma is the delimiter)
cat data | grep -E '^([^,]*,){<0-indexed col num>}[^sS]'
Upvotes: 0