MJP
MJP

Reputation: 99

Removing specific characters and numbers from text string

I am working with retrosheet play by play data in RStudio and am trying to remove the non-pitching characters (i.e. pickoff attempts, balks, etc.) from the pitch sequence column. For example:

Dataset I have:

PITCH_SEQ_TX <- c('SSS.C', 'FFBB1', 'BBSSC', 'B.BSS2', 'CBSFFFS')

Dataset I want:

PITCH_SEQ_TX <- c('SSSC', 'FFBB', 'BBSSC', 'BBSS', 'CBSFFFS')

I need to figure out a way to remove the punctuation and numbers from the text string so that only letters remain. I've tried a couple of gsub function code lines, but can't seem to figure out the right combination. Any help would be appreciated.

Upvotes: 0

Views: 887

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

You may use

PITCH_SEQ_TX <- c('SSS.C','FFBB1','BBSSC','B.BSS2','CBSFFFS')
gsub("[[:punct:][:digit:]]+", "", PITCH_SEQ_TX)

Or to remove all non-alpha:

gsub("[^[:alpha:]]+", "", PITCH_SEQ_TX)

See the R demo

The [[:punct:][:digit:]]+ is a bracket expression that matches 1 or more (due to +) punctuation ([:punct:]) or digit ([:digit:]) characters, and the [^[:alpha:]] is a negated bracket expression that matches any char that is not a letter.

Upvotes: 1

Related Questions