cmashinho
cmashinho

Reputation: 615

How to parse from string?

I have string with tags "Key", I need get text inside tags.

string = "<Key>big_img/1/V071-e.jpg</Key>"

Need "big_img/1/V071-e.jpg"?

Upvotes: 2

Views: 90

Answers (3)

Klaus D.
Klaus D.

Reputation: 14369

The most simple solution:

string.trim()[5:-6]

This will work for any length string provided it starts with <Key> and ends with </Key>.

It works because:

  • trim() removes any extraneous whitespace characters
  • <Key> will always be in the first 5 chars of the string, so start 1 char after (remember sequence/string indexes are 0-based, so starting at 5 is really starting at the 6th char)
  • the beginning of </Key> will always be 6 chars from the end of the string, so stop before that point

Upvotes: 0

ohmu
ohmu

Reputation: 19762

Use Python's xml.etree.ElementTree module to parse your XML string. If your file looks something like:

<root>
    <Key>big_img/1/V071-e.jpg</Key>
    <Key>big_img/1/V072-e.jpg</Key>
    <Key>big_img/1/V073-e.jpg</Key>
    <Key>...</Key>
</root>

First, parse your data:

from xml.etree import ElementTree

# To parse the data from a string.
doc = ElementTree.fromstring(data_string)

# Or, to parse the data from a file.
doc = ElementTree.parse('data.xml')

Then, read and print out the text from each <Key>:

for key_element in doc.findall('Key'):
    print(key_element.text)

Should output:

big_img/1/V071-e.jpg
big_img/1/V072-e.jpg
big_img/1/V073-e.jpg

Upvotes: 0

ODiogoSilva
ODiogoSilva

Reputation: 2414

Using regular expressions:

import re

s = "<Key>big_img/1/V071-e.jpg</Key>"

re.findall(r"<Key>(.*)</Key>",s)
['big_img/1/V071-e.jpg']

Upvotes: 2

Related Questions