Reputation: 336
I have a column in OpenRefine, I want to manipulate:
There are Strings (example: FL), Strings containing numbers (123F423), and numbers
I want to get rid of all letters (A-Z) that "pollute" the numbers (like the 123F423), but i do not want to change anything on the "clean" strings and numbers.
Example:
FL -> FL
123F324 -> 123324
432531 -> 432531
AB -> AB
342J34 -> 34234
Upvotes: 0
Views: 57
Reputation: 2614
You can't do what you want in a single operation as regex is designed to provide a specific match rather than the absence of a match and it will not concatenate multiple results for you.
For example, you can either repeatedly regex for [A-Z]+
and remove the offending matches or search once for all numbers [0-9]+
in a single regex and concatenate all resulting matches.
The second option would be faster as it only evaluates the expression once.
EDIT
@horcrux has a much better answer.
Upvotes: 0
Reputation: 7880
You can search and remove:
(?<=\d)[A-Za-z]+|[A-Za-z]+(?=\d)
The regex means one or more letter that are not preceded by a number or that are not followed by a number. It uses lookaround.
EDIT: If lookaround is not supported, you can simply search for
(\d)[A-Za-z]+|[A-Za-z]+(\d)
and replace with $1$2
(see demo 2)
Upvotes: 1