user2345778
user2345778

Reputation: 31

extracting items using regular expression in python

I have a a file which has the following :

new=['{"TES1":"=TES0"}}', '{"""TES1:IDD""": """=0x3C""", """TES1:VCC""": """=0x00"""}']

I am trying to extract the first item, TES1:=TES0 from the list. I am trying to use a regular expression to do this. This is what i tried but i am not able to grab the second item TES0.

import re
TES=re.compile('(TES[\d].)+')
for item in new:
    result = TES.search(item)
    print result.groups()

The result of the print was ('TES1:',). I have tried various ways to extract it but am always getting the same result. Any suggestion or help is appreciated. Thanks!

Upvotes: 1

Views: 75

Answers (3)

zx81
zx81

Reputation: 41838

First Option (with quotes)

To match "TES1":"=TES0", you can use this regex:

"TES\d+":"=TES\d+"

like this:

match = re.search(r'"TES\d+":"=TES\d+"', subject)
if match:
    result = match.group()

Second Option (without quotes)

If you want to get rid of the quotes, as in TES1:=TES0, you use this regex:

Search: "(TES\d+)":"(=TES\d+)"

Replace: \1:\2

like this:

result = re.sub(r'"(TES\d+)":"(=TES\d+)"', r"\1:\2", subject)

How does it work?

"(TES\d+)":"(=TES\d+)"
  • Match the character “"” literally "
  • Match the regex below and capture its match into backreference number 1 (TES\d+)
    • Match the character string “TES” literally (case sensitive) TES
    • Match a single character that is a “digit” (0–9 in any Unicode script) \d+
      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
  • Match the character string “":"” literally ":"
  • Match the regex below and capture its match into backreference number 2 (=TES\d+)
    • Match the character string “=TES” literally (case sensitive) =TES
    • Match a single character that is a “digit” (0–9 in any Unicode script) \d+
      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
  • Match the character “"” literally "

    \1:\2

  • Insert the text that was last matched by capturing group number 1 \1

  • Insert the character “:” literally :
  • Insert the text that was last matched by capturing group number 2 \2

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use a single replacement, example:

import re

result = re.sub(r'{"(TES\d)":"(=TES\d)"}}', '$1:$2', yourstr, 1)

Upvotes: 0

Daniel
Daniel

Reputation: 42748

I think you are looking for findall:

import re
TES=re.compile('TES[\d].')
for item in new:
    result = TES.findall(item)
    print result

Upvotes: 1

Related Questions