Kristen Cyr
Kristen Cyr

Reputation: 726

Get rid of rows with parts of strings

I have a dataframe individual_dets that has a few rows that I want to get rid of

area                       year               Temp
BON-AR-S2                  2016               1.853
BON-W-S5                   2018               2.2
HFX 102                    2018               1.2
NSTR 525                   2017               2.0  
NSTR 787                   2017               2.3
HFX 101                    2016               1.9 
Boca Raton                 2015               20
Shutter                    2015               21
Shutter                    2017               1.3
Ketch                      2017               1.3
Ketch                      2018               1.9   

I want to keep only the rows which have strings starting with NSTR, HFX, and Boca raton rows... how do I keep just these.... or how do I get rid of the rest. I've tried using multiples of this

individual_dets$area = filter(individual_dets, area != "BON-AR-S2")

But it outputs a completely different dataframe without my original data, I've also tried

individual_dets = filter(individual_dets, area != "BON-AR-S2")

but nothing happens...

anybody know how to fix this?

Upvotes: 0

Views: 29

Answers (2)

NicolasH2
NicolasH2

Reputation: 804

when searching for strings you don't use ==, but rather %in%

if you want to return the data.frame without certain rows, you don't write individual_dets$area = but rather df =. The former would change a column in your table, the latter creates a new data.frame

you can use subset (base R) instead of filter (requires dplyr)

putting it all together:

df = subset(individual_dets, !area %in% "BON-AR-S2")

edit: as pointed out by @JBGruber, use subset(individual_dets, !grepl("BON", area)) if you want to be more general in string-finding

Upvotes: 0

JBGruber
JBGruber

Reputation: 12420

!= and == only works on exact matches. If you want to match part of the string you need grepl. You also say the lines should start with NSTR, HFX, or Boca. Start of the line can be expressed with the regex ^. For more than one pattern you can use | which is the regex for or:

individual_dets = filter(individual_dets, grepl("^NSTR|^HFX|^Boca", area))

Upvotes: 2

Related Questions