zappa
zappa

Reputation: 336

Regex for removing characters from numbers, but not all characters

I have a column in OpenRefine, I want to manipulate:

There are Strings (example: FL), Strings containing numbers (123F423), and numbers

I want to get rid of all letters (A-Z) that "pollute" the numbers (like the 123F423), but i do not want to change anything on the "clean" strings and numbers.

Example:

FL -> FL

123F324 -> 123324

432531 -> 432531

AB -> AB

342J34 -> 34234

Upvotes: 0

Views: 57

Answers (2)

Zhro
Zhro

Reputation: 2614

You can't do what you want in a single operation as regex is designed to provide a specific match rather than the absence of a match and it will not concatenate multiple results for you.

For example, you can either repeatedly regex for [A-Z]+ and remove the offending matches or search once for all numbers [0-9]+ in a single regex and concatenate all resulting matches.

The second option would be faster as it only evaluates the expression once.

EDIT

@horcrux has a much better answer.

Upvotes: 0

logi-kal
logi-kal

Reputation: 7880

You can search and remove:

(?<=\d)[A-Za-z]+|[A-Za-z]+(?=\d)

See demo

The regex means one or more letter that are not preceded by a number or that are not followed by a number. It uses lookaround.

EDIT: If lookaround is not supported, you can simply search for

(\d)[A-Za-z]+|[A-Za-z]+(\d)

and replace with $1$2 (see demo 2)

Upvotes: 1

Related Questions