Reputation: 89
Well, I have the following html that I want to get the @data-coords
attribute from, but I want the latitude and longitude to be in different variables. See html bellow:
<div id="gmap-container">
<div id="gmap-value" data-coords="-26.995548880319042,-48.633818457672135,16,150">
...
</div>
</div>
If I use //div[@id='gmap-imovel']/@data-coords
as XPath, it returns the entire thing from @data-coords
attribute.
My Python code is something like that:
xpaths = {
"parser_lat": "//div[@id='gmap-value']/@data-coords",
"parser_lon": "//div[@id='gmap-value']/@data-coords"
}
latitude: str = parsel.Selector().xpath(xpaths['parser_lat']).extract_first()
longitude: str = parsel.Selector().xpath(xpaths['parser_lon']).extract_first()
return latitude, longitude
I would like to get the latitude and longitude splitted as mentioned above, I know that I can add regular expression to the Python code to get what I want, but that way would break the pipe for others websites. Example using regular expression that I don't want to use:
regex_expression = r'^-(\d+\.\d+)'
latitude = re.findall(regex_expression, '-26.995548880319042,-48.633818457672135,16,150')[0]
longitude = re.findall(regex_expression, '-26.995548880319042,-48.633818457672135,16,150')[1]
This example above would give me the -26.995548880319042
and -48.633818457672135
in their respective variables, but as I mentioned this will break the pipe to other websites.
I want to get this result I mentioned above only using XPath, like this:
parser_lat: regex('^-(\d+\.\d+)', //div[@id='gmap-imovel']/@data-coords)[0]
parser_lon: regex('^-(\d+\.\d+)', //div[@id='gmap-imovel']/@data-coords)[1]
and then use it in the first Python code example I gave.
I tried using substring
but didn't worked for me.
Upvotes: 2
Views: 331
Reputation: 52888
Try using substring-before()
and substring-after()
in your XPaths...
xpaths = {
"parser_lat": "substring-before(//div[@id='gmap-value']/@data-coords, ',')",
"parser_lon": "substring-after(//div[@id='gmap-value']/@data-coords, ',')"
}
Upvotes: 1