Natasha
Natasha

Reputation: 1521

Split the contents of a string

I have the following string

s = "ΔG'° = (-19.9 +/- 0.4) kilojoule / mole"

I'd like to generate a dictionary like the following

d = {"mean"= -19.9, "sd": 0.4, "units": "kilojoule / mole"}

If the string is -19.9 +/- 0.4 I could do s.split("+/-"). But in the given format, I have to split several times based on each delimiter.

Is there an easy way of doing this?

Upvotes: 2

Views: 48

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can use

r'=[^\d-]*(?P<mean>-?\d*\.?\d+)\s*\+/-\s*(?P<sd>\d*\.?\d+)\W+(?P<units>.+)'

See the regex demo. Details:

  • = - a = sign
  • [^\d-]* - zero or more chars other than digit and -
  • (?P<mean>-?\d*\.?\d+) - Group "mean": an optional -, zero or more digits, an optional . and then one or more digits
  • \s*\+/-\s* - a +/- substring enclosed with zero or more whitespaces
  • (?P<sd>\d*\.?\d+) - Group "sd": zero or more digits, an optional . and then one or more digits
  • \W+ - one or more non-word chars
  • (?P<units>.+) - Group "units": the rest of the string.

See the Python demo:

import re
rx = r'=[^\d-]*(?P<mean>-?\d*\.?\d+)\s*\+/-\s*(?P<sd>\d*\.?\d+)\W+(?P<units>.+)'
text = r"ΔG'° = (-19.9 +/- 0.4) kilojoule / mole"
m = re.search(rx, text)
if m:
    print(m.groupdict())
# => {'mean': '-19.9', 'sd': '0.4', 'units': 'kilojoule / mole'}

Upvotes: 2

Related Questions