Llanilek
Llanilek

Reputation: 3466

python plain text regex parsing

I need to write a small parser that will extract data from a form.

The data will be posted in a consistent pattern all the time. As follows:

Panakamanana
104412=Trident of Corrupted Waters
104411=Immerseus' Crystalline Eye
104435=Stonetoe's Tormented Treads
104455=Reality Ripper Ring
99716=Chest of the Cursed Protector
104509=Laser Burn Bracers
104531=Haromm's Talisman
99722=Gauntlets of the Cursed Protector
104562=Ring of Restless Energy
104606=Gleaming Eye of the Devilsaur
99725=Helm of the Cursed Protector
99719=Shoulders of the Cursed Protector
104616=Ticking Ebon Detonator
105686=Hellscream's Pig Sticker

The only data I'm interested in is each integer before the = sign. I want to then be able to iterate over these so perhaps putting them in a dict or array or something would be great.

Upvotes: 0

Views: 489

Answers (3)

sshashank124
sshashank124

Reputation: 32189

You can simply do that as follows:

new_list = [int(line[:line.find('=')]) for line in your_list]

print new_list

Upvotes: 0

Burhan Khalid
Burhan Khalid

Reputation: 174622

Here is one way to get it done:

with open('somefile.txt') as f:
   next(f) # Skips the first line, which doesn't have =
   numbers = [line.split('=')[0] for line in f if len(line.strip())]

print(numbers)

If you want to use regular expressions:

>>> import re
>>> s = "104412=Trident of Corrupted Waters"
>>> re.findall(r'^(\d+)', s)[0]
'104412'

Upvotes: 2

anon582847382
anon582847382

Reputation: 20371

Just split the string up using '=' as the delimiter. The most Pythonic way to implement this would be to use a list comprehension:

>>> [int(line.split('=')[0]) for line in your_lines[1:]]
[104412, 104411, ..., 105686]

Where your_lines is a list of the lines demonstrated in your question.

Upvotes: 1

Related Questions