Logan McDonald
Logan McDonald

Reputation: 85

subseting data that starts with a letter

I have a dataset with 45 columns and >8000 observations. One of the variables in the columns is city-name. I want to remove all observations that are located in cities that begin with the letter "S". How would I do this? I'm pretty new to R, so sorry if this is simple, but I couldn't find any information through search.

Upvotes: 0

Views: 74

Answers (3)

Peter Diakumis
Peter Diakumis

Reputation: 4042

You can use dplyr's filter function, although I have no idea how fast it is compared to other methods:

cities <- c("Some", "Random", "Cities", "Stack", "Overflow", "Bla", "Foo")
df <- data.frame(x = seq_along(cities), cities)
require(dplyr)
> df  %>% filter(!grepl("^[Ss]", cities))
  x   cities
1 2   Random
2 3   Cities
3 5 Overflow
4 6      Bla
5 7      Foo

Upvotes: 1

IRTFM
IRTFM

Reputation: 263362

This will return only those rows beginning with a capital "S" using the substr()-ing function:

dat[ substr( dat$City, 1 ,1) == "S" , ]

Could also have used:

dat[ grepl("^S", dat$City) , ]

The second option is a very simple regular expression. Look at ?regex and ?grep.

Upvotes: 1

David Ehrmann
David Ehrmann

Reputation: 7576

awk would be better for this. Something like

cat data | awk -F<delimiter> '{if (match($<1-indexed col num>, "^[^sS].*")) { print $0 }}'

You can do it in grep, but it get's sloppy (comma is the delimiter)

cat data | grep -E '^([^,]*,){<0-indexed col num>}[^sS]'

Upvotes: 0

Related Questions