Shirish
Shirish

Reputation: 304

Regex: Replace all except numbers, specific characters and specific words

If I have text like this:

    CARBON                                                               1569
    1.00% IRON                                                           234
    99% CARBON, 1% IRON                                                  181
    98.2% CARBON 1% ZINC                                                 181
    99% CARBON#1% IRON                                                   141
    ASD CARBON 2% IRON RANDOMWORD                                        23

Let's say I want to retain only the element names and percentage values (which includes numbers, decimal point and percentage sign). I can run a regex substitution to do so. I tried out plenty of combinations of stuff like (CARBON|IRON|ZINC), which replaces all occurences of element names, and [^0-9.\%]+ which retains all percentage values.

But I can't figure out how to combine these such that I retain both the percentage values and element names. Any help would be appreciated.

EDIT: The spaces would also need to be retained for the output to make sense. All unnecessary characters can be replaced by spaces. The expected output would be

    CARBON                                                               1569
    1.00% IRON                                                           234
    99% CARBON  1% IRON                                                  181
    98.2% CARBON 1% ZINC                                                 181
    99% CARBON 1% IRON                                                   141
        CARBON 2% IRON                                                   23

Upvotes: 1

Views: 352

Answers (3)

anubhava
anubhava

Reputation: 785286

You may use this regex to match your desired text:

\b(CARBON\b|IRON\b|ZINC\b|\d+(?:\.\d+)?(?:%|\b))|\S

And replace it by '\1 ' (will add trailing spaces in input lines)

RegEx Demo

RegEx Detail:

  • \b: Word boundary
  • (: Start capture group
    • CARBON\b: Match CARBON followed by word boundary
    • |: OR
    • IRON\b: Match IRON followed by word boundary
    • |: OR
    • ZINC\b: Match ZINC followed by word boundary
    • |: OR
    • \d+(?:\.\d+)?: Match an integer or float number
    • (?:%|\b): Followed by % or word boundary
  • ):
  • |: OR
  • \S: Match a non-whitespace character

Upvotes: 2

MePsyDuck
MePsyDuck

Reputation: 374

You can try replacing all the words except: * Element names * Numbers * Percentage.

To achieve this you can use negative lookahead:

(?!CARBON|IRON|ZINC|(\d+\.\d+\%)|\d+)\b[a-zA-Z#]+

Demo

Upvotes: 1

Mustofa Rizwan
Mustofa Rizwan

Reputation: 10466

To simplify you May start with this as per your requirement:

\b(?!CARBON|ZINC|IRON)[a-zA-Z#]+

Then you may have to post process something (like # being replaced by blank) as per your comments.

REGEX101

Upvotes: 1

Related Questions