jqwerty
jqwerty

Reputation: 37

Can anyone see why my python regex search is only outputtings "0"s?

I'm working on a python program to extract all the tags within a kml file.

    import re

    KML = open('NYC_Tri-State_Area.kml','r')

    NYC_Coords = open('NYC_Coords.txt', 'w')

    coords = re.findall(r'<coordinates>+(.)+<\/coordinates>', KML.read())

    for coord in coords:
        NYC_Coords.write(str(coord) + "\n")

    KML.close()
    NYC_Coords.close()

I tested the regex on the file within RegExr and it worked properly.

Here is a small sample of the kml file I'm reading: http://puu.sh/bhayn/2e233a1033.png

The output file contains lines with a single 0 on every line except the last one which is empty.

Upvotes: 1

Views: 76

Answers (1)

hwnd
hwnd

Reputation: 70732

It seems you have the + operators placed outside of your grouping.

So with >+ this matches > literally between "one or more" times and using the dot . in conjuction with a repeated capturing group (.)+ only the last iteration will be captured, in this case 0 for each match result.

Remove the beginning + operator and move the one placed outside of the group to the inside.

coords = re.findall(r'<coordinates>(.+?)</coordinates>', KML.read())

Note: Use +? to prevent greediness, you also probably want to use the s (dotall) modifier here.

Upvotes: 3

Related Questions