Reputation: 3
I'm trying to remove a specific pattern followed by changing combination of digits or letters in an R script.
Pattern to be removed: " Alpha code for WIS - Info Only - see journal XXXX"
where XXXX can be a 4-digit number, a combination of a letter + 3-digit number or 3 letters.
I've tried already:
str_replace(x, '^\\s "Alpha code for WIS - Info Only - see journal" \\b[A-Z1-9]{4}\\b','')
str_replace(x, '^\\s "Alpha code for WIS - Info Only - see journal" ([0-9])','')
str_replace(x, '^\\sAlpha code for WIS - Info Only - see journal ([0-9]+)','')
None of these work. I've also tried similar regex with gsub, and again I didn't go any further.
I could go in 3 steps, replacing first the 4-digit number, then the letter combination and finally the alphanumeric, if it's easier.
Upvotes: 0
Views: 229
Reputation: 6272
Try a regex like this with gsub
:
"Alpha code for WIS - Info Only - see journal ([0-9]{4}|[a-zA-Z][0-9]{3}|[a-zA-Z]{3})
So the snippet of code will be:
test <- "Line1: Alpha code for WIS - Info Only - see journal 1234\nLine2: Alpha code for WIS - Info Only - see journal A123\nLine3: Alpha code for WIS - Info Only - see journal AbC\nLine4: line 4 content"
result <- gsub("Alpha code for WIS - Info Only - see journal ([0-9]{4}|[a-zA-Z][0-9]{3}|[a-zA-Z]{3})", '', test)
print(result)
Output
[1] "Line1: \nLine2: \nLine3: \nLine4: line 4 content"
Upvotes: 1