Reputation: 304
If I have text like this:
CARBON 1569
1.00% IRON 234
99% CARBON, 1% IRON 181
98.2% CARBON 1% ZINC 181
99% CARBON#1% IRON 141
ASD CARBON 2% IRON RANDOMWORD 23
Let's say I want to retain only the element names and percentage values (which includes numbers, decimal point and percentage sign). I can run a regex substitution to do so. I tried out plenty of combinations of stuff like (CARBON|IRON|ZINC)
, which replaces all occurences of element names, and [^0-9.\%]+
which retains all percentage values.
But I can't figure out how to combine these such that I retain both the percentage values and element names. Any help would be appreciated.
EDIT: The spaces would also need to be retained for the output to make sense. All unnecessary characters can be replaced by spaces. The expected output would be
CARBON 1569
1.00% IRON 234
99% CARBON 1% IRON 181
98.2% CARBON 1% ZINC 181
99% CARBON 1% IRON 141
CARBON 2% IRON 23
Upvotes: 1
Views: 352
Reputation: 785286
You may use this regex to match your desired text:
\b(CARBON\b|IRON\b|ZINC\b|\d+(?:\.\d+)?(?:%|\b))|\S
And replace it by '\1 '
(will add trailing spaces in input lines)
RegEx Detail:
\b
: Word boundary(
: Start capture group
CARBON\b
: Match CARBON
followed by word boundary|
: ORIRON\b
: Match IRON
followed by word boundary|
: ORZINC\b
: Match ZINC
followed by word boundary|
: OR\d+(?:\.\d+)?
: Match an integer or float number(?:%|\b)
: Followed by %
or word boundary)
:|
: OR\S
: Match a non-whitespace characterUpvotes: 2
Reputation: 374
You can try replacing all the words except: * Element names * Numbers * Percentage.
To achieve this you can use negative lookahead:
(?!CARBON|IRON|ZINC|(\d+\.\d+\%)|\d+)\b[a-zA-Z#]+
Upvotes: 1
Reputation: 10466
To simplify you May start with this as per your requirement:
\b(?!CARBON|ZINC|IRON)[a-zA-Z#]+
Then you may have to post process something (like # being replaced by blank) as per your comments.
Upvotes: 1