Reputation: 83
I have some strings that can contain letters, numbers and '#' symbol.
I would like to remove digits except for the words that start with '#'
Here is an example:
"table9 dolv5e #10n #dec10 #nov8e 23 hello"
And the expected output is:
"table dolve #10n #dec10 #nov8e hello"
How can I do this with regex, stringr or gsub?
Upvotes: 8
Views: 1083
Reputation: 101
INPUT = "table9 dolv5e #10n #dec10 #nov8e 23 hello";
OUTPUT = INPUT.match(/[^#\d]+(#\w+|[A-Za-Z]+\w*)/gi).join('');
You can remove flags i
, cause it was case insensitive
Use this pattern: [^#\d]+(#\w+|[A-Za-Z]+\w*)
[^#\d]+
= character start with no # and digits
#\w+
= find # followed by digit or letter
[A-Za-z]+\w*
= find letter followed by letter and/or number
^
|
You can change this with \D+\S*
= find any character not just when the first is letter and not just followed by letter and/or number.
I am not put as \w+\w*
cause \w
same as = [\w\d]
.
I tried the code in JavaScript and it work. If you want match not only followed by letter you can use code
Upvotes: 0
Reputation: 18565
How about capturing the wanted and replacing the unwanted with empty (non captured).
gsub("(#\\S+)|\\d+","\\1",x)
See demo at regex101 or R demo at tio.run (I have no experience with R)
My Answer is assuming, that there is always whitespace between #foo bar #baz2
. If you have something like #foo1,bar2:#baz3 4
, use \w
(word character) instead of \S
(non whitespace).
Upvotes: 5
Reputation: 5798
Base R solution:
unlisted_strings <- unlist(strsplit(X, "\\s+"))
Y <- paste0(na.omit(ifelse(grepl("[#]", unlisted_strings),
unlisted_strings,
gsub("\\d+", "", unlisted_strings))), collapse = " ")
Y
Data:
X <- as.character("table9 dolv5e #10n #dec10 #nov8e 23 hello")
Upvotes: 0
Reputation: 1502
You could split the string on spaces, remove digits from tokens if they don't start with '#' and paste back:
x <- "table9 dolv5e #10n #dec10 #nov8e 23 hello"
y <- unlist(strsplit(x, ' '))
paste(ifelse(startsWith(y, '#'), y, sub('\\d+', '', y)), collapse = ' ')
# output
[1] "table dolve #10n #dec10 #nov8e hello"
Upvotes: 5
Reputation: 47008
You use gsub to remove digits, for example:
gsub("[0-9]","","table9")
"table"
And we can split your string using strsplit:
STRING = "table9 dolv5e #10n #dec10 #nov8e 23 hello"
strsplit(STRING," ")
[[1]]
[1] "table9" "dolv5e" "#10n" "#dec10" "#nov8e" "23" "hello"
We just need to iterate through STRING, with gsub, applying it only to elements that do not have "#"
STRING = unlist(strsplit(STRING," "))
no_hex = !grepl("#",STRING)
STRING[no_hex] = gsub("[0-9]","",STRING[no_hex])
paste(STRING,collapse=" ")
[1] "table dolve #10n #dec10 #nov8e hello"
Upvotes: 1