Hans
Hans

Reputation: 5505

Using regex in R to find strings as whole words (but not strings as part of words)

I'm searching for the right regular expression. The following

t1 = c("IGF2, IGF2AS, INS, TH", "TH", "THZH", "ZGTH")
grep("TH",t1, value=T)

returns all elements of t1, but only the first and second are correct. I just want entries with word/phrase TH returned?

Upvotes: 48

Views: 38408

Answers (2)

Anatoliy
Anatoliy

Reputation: 1380

You can use \< and \> in a regexp to match at the beginning/end of the word.

grep ("\\<TH\\>", t1) etc.

Upvotes: 24

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

You need to add word boundary anchors (\b) around your search strings so only entire words will be matched (i. e. words surrounded by non-word characters or start/end of string, where "word character" means \w, i.e. alphanumeric character).

Try

grep("\\bTH\\b",t3, value=T)

Upvotes: 55

Related Questions