João Koritar
João Koritar

Reputation: 89

XPath - Get just some part of the attribute value or text node

Well, I have the following html that I want to get the @data-coords attribute from, but I want the latitude and longitude to be in different variables. See html bellow:

<div id="gmap-container">
    <div id="gmap-value" data-coords="-26.995548880319042,-48.633818457672135,16,150">
        ...
    </div>
</div>

If I use //div[@id='gmap-imovel']/@data-coords as XPath, it returns the entire thing from @data-coords attribute.

My Python code is something like that:

xpaths = {
    "parser_lat": "//div[@id='gmap-value']/@data-coords", 
    "parser_lon": "//div[@id='gmap-value']/@data-coords"
}

latitude: str = parsel.Selector().xpath(xpaths['parser_lat']).extract_first()
longitude: str = parsel.Selector().xpath(xpaths['parser_lon']).extract_first()

return latitude, longitude

I would like to get the latitude and longitude splitted as mentioned above, I know that I can add regular expression to the Python code to get what I want, but that way would break the pipe for others websites. Example using regular expression that I don't want to use:

regex_expression = r'^-(\d+\.\d+)'

latitude = re.findall(regex_expression, '-26.995548880319042,-48.633818457672135,16,150')[0]
longitude = re.findall(regex_expression, '-26.995548880319042,-48.633818457672135,16,150')[1]

This example above would give me the -26.995548880319042 and -48.633818457672135 in their respective variables, but as I mentioned this will break the pipe to other websites.

I want to get this result I mentioned above only using XPath, like this:

parser_lat: regex('^-(\d+\.\d+)', //div[@id='gmap-imovel']/@data-coords)[0]
parser_lon: regex('^-(\d+\.\d+)', //div[@id='gmap-imovel']/@data-coords)[1]

and then use it in the first Python code example I gave.

I tried using substring but didn't worked for me.

Upvotes: 2

Views: 331

Answers (1)

Daniel Haley
Daniel Haley

Reputation: 52888

Try using substring-before() and substring-after() in your XPaths...

xpaths = {
    "parser_lat": "substring-before(//div[@id='gmap-value']/@data-coords, ',')", 
    "parser_lon": "substring-after(//div[@id='gmap-value']/@data-coords, ',')"
}

Upvotes: 1

Related Questions