Reputation: 338
I'm trying to clean some strings which contain a combination of letters and numbers
a <- c("Hello World","Hello4 World","12345","Hello World 4","4Hello World5","Hello 4", "Hello4")
I'm trying to remove the numeric portion of the alphanumeric strings but retain the pure numbers or when the number is separated by space, the output I'm looking for is.
b <- c("Hello World","Hello World","12345","Hello World 4","Hello World", "Hello 4","Hello")
The strings could be anything and not necessarily 'Hello' or 'World', I've tried various regex combinations but couldn't get what i wanted.
Any help would be appreciated!
Upvotes: 0
Views: 117
Reputation: 79348
gsub('(?i)(?<=[a-z])\\d+|\\d+(?=[a-z])','',a,perl=T)
[1] "Hello World" "Hello World" "12345" "Hello World 4" "Hello World" "Hello 4" "Hello"
?i
is used to IGNORE CASES. ie you can also use the argument ignore.case = TRUE
(?<=[a-z])\\d+
This is a lookbehind whereby you are looking for digit(s) ie \\d+ immediately preceded by a letter
(?<=[a-z])`
|
or
\\d+(?=[a-z])
this is a lookahead whereby you look for a digit(s) \\d+
immediately followed by a letter (?=[a-z])
.
Substitute this with an empty string. ie replacement =''
is the second argument of the gsub
function
gsub('([a-z])\\d+|\\d+([a-z])','\\1\\2',a,ignore.case = T)
[1] "Hello World" "Hello World" "12345" "Hello World 4" "Hello World" "Hello 4" "Hello"
This follows almost the same trick but instead of using lookarounds, we use backreferencing.
([a-z])\\d+
capture the letter that is immediately before a digit(s) as group 1|\\d+([a-z])
capture the letter that immediately follows the digits
as group 2Now replace the whole expression with the captured letters ie \\1\\2
You can mix the two regular expressions as you want.
Upvotes: 2
Reputation: 610
Make use of regex after splitting the input by space
[A-Za-z] - all letters
^[0-9] - all digits
Upvotes: 0