user3249186
user3249186

Reputation: 185

Jsoup/Java - How to extract data that is not inside a tag

I have to parse a HTML like this

<span class="legenda">Cargo a que concorre:</span> Vereador<br />
<span class="legenda">Nome para urna:</span> Adeilza<br />
<span class="legenda">Número:</span> 40656<br />
<span class="legenda">Estado:</span> Amapá<br />
<span class="legenda">Município:</span> Vitória do Jari<br />
<span class="legenda">Partido:</span> Partido Socialista Brasileiro - PSB<br />
<span class="legenda">Coligação:</span> Vitória para todos (PSB / PV / PRTB)<br />

I am using jsoup to parse and follow the examples, but I did not know how to get the values after in this case. e.g "Vereador" or "Adeilza". There is a way to do that with jsoup?

here is the link, if anyone wants to see all the html page. view-source:http://www.eleicoes2012.info/adeilza-psb-40656/

Upvotes: 2

Views: 393

Answers (1)

Durandal
Durandal

Reputation: 5663

Calling nextSibling on a jSoup Element will give you the next Node it finds. In this case you can use the a selector for span elements with a class legenda and then call nextSibling. Quick example:

Document doc = Jsoup.connect("http://www.eleicoes2012.info/adeilza-psb-40656/").get();
Elements spans = doc.select("span.legenda");

for(Element span: spans) {
    System.out.println(span.nextSibling());
}

Produced this output for me:

Adeilza Ribeiro de Souza
30 anos (09/08/1983)
Almeirim/PA
Solteiro(A)
Dona de Casa
Ensino Fundamental Incompleto

 0 Votos
 Vereador
 Adeilza
 40656
 Amap&aacute;
 Vit&oacute;ria do Jari
 Partido Socialista Brasileiro - PSB
 Vit&oacute;ria para todos (PSB / PV / PRTB)

Upvotes: 2

Related Questions