Reputation: 99
I am working with retrosheet play by play data in RStudio and am trying to remove the non-pitching characters (i.e. pickoff attempts, balks, etc.) from the pitch sequence column. For example:
Dataset I have:
PITCH_SEQ_TX <- c('SSS.C', 'FFBB1', 'BBSSC', 'B.BSS2', 'CBSFFFS')
Dataset I want:
PITCH_SEQ_TX <- c('SSSC', 'FFBB', 'BBSSC', 'BBSS', 'CBSFFFS')
I need to figure out a way to remove the punctuation and numbers from the text string so that only letters remain. I've tried a couple of gsub
function code lines, but can't seem to figure out the right combination. Any help would be appreciated.
Upvotes: 0
Views: 887
Reputation: 626699
You may use
PITCH_SEQ_TX <- c('SSS.C','FFBB1','BBSSC','B.BSS2','CBSFFFS')
gsub("[[:punct:][:digit:]]+", "", PITCH_SEQ_TX)
Or to remove all non-alpha:
gsub("[^[:alpha:]]+", "", PITCH_SEQ_TX)
See the R demo
The [[:punct:][:digit:]]+
is a bracket expression that matches 1 or more (due to +
) punctuation ([:punct:]
) or digit ([:digit:]
) characters, and the [^[:alpha:]]
is a negated bracket expression that matches any char that is not a letter.
Upvotes: 1