It's me
It's me

Reputation: 1115

Regular Expression in Django

I will get the output from the query like:

[ (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)') ]

I want to get the POINT value separately to get the lat and long value using regular expressions like.

_RE = re.compile('\(\([\d\-\., ]*\)\)')
for i in cursor.fetchall():
    for p in _RE.findall(i[1]):
        // I want latitude and longitude value from POINT(-122.106035882 37.397386475) 

My regular expression is wrong. Can someone help me on correcting this:

_RE = re.compile('\(\([\d\-\., ]*\)\)'))

Upvotes: 0

Views: 151

Answers (3)

Don
Don

Reputation: 17606

Be more explicit:

import re
p = re.compile(r"POINT\(([-\d\.]+)\s([-\d\.]+)\)")

data = [
(14577692L, 'POINT(-122.106035882 37.397386475)'),
(14577692L, 'POINT(-122.106035882 37.397386475)'),
(14577692L, 'POINT(-122.106035882 37.397386475)')
]

for record in data:
    lat, lon = p.search(record[1]).groups()
    print lat, lon

result:

-122.106035882 37.397386475
-122.106035882 37.397386475
-122.106035882 37.397386475

You can also get a dictionary with named variables:

p = re.compile(r"POINT\((?P<lat>[-\d\.]+)\s(?P<lon>[-\d\.]+)\)")
...
for record in data:
    coordinates = p.match(record[1]).groupdict()
    print coordinates

result:

{'lat': '-122.106035882', 'lon': '37.397386475'}
{'lat': '-122.106035882', 'lon': '37.397386475'}
{'lat': '-122.106035882', 'lon': '37.397386475'}

Upvotes: 2

Air
Air

Reputation: 8595

This doesn't require a regular expression. Because the format of the POINT() is static, you can simply slice out the part of the string that contains the coordinates and split them on the space:

 resultset = [
    (14577692L, 'POINT(-122.106035882 37.397386475)'),
    (14577692L, 'POINT(-122.106035882 37.397386475)'),
    (14577692L, 'POINT(-122.106035882 37.397386475)')
]

for row in resultset:
    coordinatestring = row[1][6:-1]
    lat, lon = (float(x) for x in coordinatestring.split(' '))
    do_something_with(lat, lon)

The slicing notation [6:-1] omits the first 6 characters and the last character of the original string, which are POINT( and ), respectively. That leaves you with two numbers separated by a space, which is easy to deal with as above.

If you absolutely must use a regular expression, you should use a raw string to avoid having to escape characters twice, and use two capturing groups so you can distinguish between the first and second coordinate:

>>> import re
>>> _RE = re.compile(r'POINT\(([-\d\.]+)\s([-\d\.]+)\)')
>>> _RE.groups
2
>>> _RE.search('POINT(-122.106035882 37.397386475)').groups()
('-122.106035882', '37.397386475')

Even that regex is overkill, though; since you know the format of the POINT() is static, you could just look for the values themselves, ignoring the letters and parens:

>>> _RE = re.compile(r'([-\d\.]+)\s([-\d\.]+)')
>>> _RE.search('POINT(-122.106035882 37.397386475)').groups()
('-122.106035882', '37.397386475')

At this point it's getting simple enough to point at the possibility that you don't need a regex at all (which I've already shown). It's never a bad idea to question the necessity of using re and consider simpler alternatives.

Upvotes: 5

vks
vks

Reputation: 67968

POINT\((-?\d+(?:\.\d+)?)\s+(-?\d+(?:\.\d+)?)\)

Try this.See demo.

https://regex101.com/r/sH8aR8/32

import re
p = re.compile(r'POINT\((-?\d+(?:\.\d+)?)\s+(-?\d+(?:\.\d+)?)\)', re.IGNORECASE | re.DOTALL)
test_str = "[ (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)') ]"

re.findall(p, test_str)

Upvotes: 0

Related Questions