Cleber Marques
Cleber Marques

Reputation: 476

How to extract data w xpath in python when the HTML classes have the same name

I am trying to individually crab the values 51011020, Recife, Boa Viagem, but I cannot undestand how the expression could differenciate those elements since the classes have the name.

In [24]: response.xpath('//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()')
Out[24]: 
[<Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='51011020'>,
 <Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='Recife'>,
 <Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='Boa Viagem'>]

When try the code above, it returns the three data together. How can I get them individually? An explanation will be so much appreciated.

<div class="h3us20-5 jHoWDW">
    <div class="h3us20-2 fMOiyI">
        <div flexDirection="column" class="sc-jTzLTM sc-ksYbfQ uUqze">
            <span weight="semiBold" theme="[object Object]" tag="span" color="dark" font-weight="400" class="sc-ifAKCX dqTZSU">Localização</span>
            <div class="h3us20-4 eowFbc"></div>
            <div data-testid="ad-properties" class="sc-bwzfXH h3us20-0 cBfPri">
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">CEP</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">51011020</dd>
                    </div>
                </div>
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">Município</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">Recife</dd>
                    </div>
                </div>
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">Bairro</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">Boa Viagem</dd>
                    </div>
                </div>
            </div>
        </div>
        <div class="h3us20-4 hrzRZZ"></div>
    </div>
</div>

Upvotes: 0

Views: 40

Answers (1)

E.Wiest
E.Wiest

Reputation: 5905

Since you want data individually you'll need 3 different XPath.

You can use position indexes ([1], [2], [3] with ()) :

(//dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"])[1]/text()
(//dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"])[2]/text()
(//dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"])[3]/text()

Or text predicate (.="") with axes (following-sibling) :

//dt[.="CEP"]/following-sibling::dd/text()
//dt[.="Município"]/following-sibling::dd/text()
//dt[.="Bairro"]/following-sibling::dd/text()

Output in both cases :

51011020
Recife
Boa Viagem

Upvotes: 1

Related Questions