Reputation: 1115
I will get the output from the query like:
[ (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)') ]
I want to get the POINT value separately to get the lat and long value using regular expressions like.
_RE = re.compile('\(\([\d\-\., ]*\)\)')
for i in cursor.fetchall():
for p in _RE.findall(i[1]):
// I want latitude and longitude value from POINT(-122.106035882 37.397386475)
My regular expression is wrong. Can someone help me on correcting this:
_RE = re.compile('\(\([\d\-\., ]*\)\)'))
Upvotes: 0
Views: 151
Reputation: 17606
Be more explicit:
import re
p = re.compile(r"POINT\(([-\d\.]+)\s([-\d\.]+)\)")
data = [
(14577692L, 'POINT(-122.106035882 37.397386475)'),
(14577692L, 'POINT(-122.106035882 37.397386475)'),
(14577692L, 'POINT(-122.106035882 37.397386475)')
]
for record in data:
lat, lon = p.search(record[1]).groups()
print lat, lon
result:
-122.106035882 37.397386475
-122.106035882 37.397386475
-122.106035882 37.397386475
You can also get a dictionary with named variables:
p = re.compile(r"POINT\((?P<lat>[-\d\.]+)\s(?P<lon>[-\d\.]+)\)")
...
for record in data:
coordinates = p.match(record[1]).groupdict()
print coordinates
result:
{'lat': '-122.106035882', 'lon': '37.397386475'}
{'lat': '-122.106035882', 'lon': '37.397386475'}
{'lat': '-122.106035882', 'lon': '37.397386475'}
Upvotes: 2
Reputation: 8595
This doesn't require a regular expression. Because the format of the POINT()
is static, you can simply slice out the part of the string that contains the coordinates and split them on the space:
resultset = [
(14577692L, 'POINT(-122.106035882 37.397386475)'),
(14577692L, 'POINT(-122.106035882 37.397386475)'),
(14577692L, 'POINT(-122.106035882 37.397386475)')
]
for row in resultset:
coordinatestring = row[1][6:-1]
lat, lon = (float(x) for x in coordinatestring.split(' '))
do_something_with(lat, lon)
The slicing notation [6:-1]
omits the first 6 characters and the last character of the original string, which are POINT(
and )
, respectively. That leaves you with two numbers separated by a space, which is easy to deal with as above.
If you absolutely must use a regular expression, you should use a raw string to avoid having to escape characters twice, and use two capturing groups so you can distinguish between the first and second coordinate:
>>> import re
>>> _RE = re.compile(r'POINT\(([-\d\.]+)\s([-\d\.]+)\)')
>>> _RE.groups
2
>>> _RE.search('POINT(-122.106035882 37.397386475)').groups()
('-122.106035882', '37.397386475')
Even that regex is overkill, though; since you know the format of the POINT()
is static, you could just look for the values themselves, ignoring the letters and parens:
>>> _RE = re.compile(r'([-\d\.]+)\s([-\d\.]+)')
>>> _RE.search('POINT(-122.106035882 37.397386475)').groups()
('-122.106035882', '37.397386475')
At this point it's getting simple enough to point at the possibility that you don't need a regex at all (which I've already shown). It's never a bad idea to question the necessity of using re
and consider simpler alternatives.
Upvotes: 5
Reputation: 67968
POINT\((-?\d+(?:\.\d+)?)\s+(-?\d+(?:\.\d+)?)\)
Try this.See demo.
https://regex101.com/r/sH8aR8/32
import re
p = re.compile(r'POINT\((-?\d+(?:\.\d+)?)\s+(-?\d+(?:\.\d+)?)\)', re.IGNORECASE | re.DOTALL)
test_str = "[ (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)') ]"
re.findall(p, test_str)
Upvotes: 0