Dcook
Dcook

Reputation: 961

Extract units from string using regex in python

I have a string like this:

str1='x[cm],z,y[km]'

I want to find the units available in the above variable. If I use re.findall(r"\[([A-Za-z0-9_/*^\\(\\)-\.]+)\]", str1) then it gives ['cm', 'km'], but I want the output to be ['cm', '', 'km'] since z has no unit associated. How can I achieve this?

Similaly for input string T(x[g],y,z[m])[kg] the output should be ['g','','m','kg']

Upvotes: 3

Views: 745

Answers (3)

logi-kal
logi-kal

Reputation: 7880

You can use this regex:

((?<!\])|(?<=\[)[^\[\],]*)\]?(?:,|\)|$)

Explanation:

(             # open capturing group
  (?<!\])     #   the match is not preceded by a closed squared bracket
              #   match an empty string
|             # OR
  (?<=\[)     #   the match is preceded by an open squared bracket 
  [^\[\],]*   #   match zero or more characters that are neither squared brackets nor commas
)             # close capturing group
\]?           # consume an optional closed squared bracket
(?:,|\)|$)    # consume a comma or a closed parenthesis or match the end of the string

re.findall will output the content of the capturing group.

Upvotes: 1

Christian Weiss
Christian Weiss

Reputation: 151

You can split the string and the problem gets easier. To extract the unit from each substring you can write:

def extract_unit(s: str) -> str:
    match = re.search(r"\[(.*?)\]", s)
    return s[match.start() + 1: match.end() - 1] if match else ""

and to create the list you can add the following code:

l = [extract_unit(s) for s in str1.split(',')]

Upvotes: 1

Synthaze
Synthaze

Reputation: 6090

This fix is certainly not regex-expert. Still, reformat your input to add empty brackets to relevant fields. Then you can use a simple regex to catch what you want.

import re

str1 = 'T(x[g],y,z[m])[kg]'

str1 = ''.join([x if '[' in x else x + '[ ]' for x in str1.split(',')])

print(re.findall(r'\[([\w\s]+)\]', str1))

Output:

['g', ' ', 'm', 'kg']

Upvotes: 0

Related Questions