learner
learner

Reputation: 4818

How to iterate through various elements in jsoup?

I have to parse a page by jsoup. The page has a class and various elements with tags like p,h1,h2,h3 etc. I want to parse them one by one, and then process each one of them. The page looks like:

    <div class="pf-content">
        <p>For centuries, Spain shone and progressed under Muslim rule. Unfortunately, the city of Seville fell prey to the barbaric onslaught of the Kingdom of Castile in the year 1248. Several innocent Spaniards were killed, many were forced to leave their homeland and seek refuge elsewhere, whereas many others were captured and taken as slaves. The rulers of Castile further destroyed remnants of Islamic life and culture, <a href="https://muslimmemo.com/masjids-spain/">including masjids</a>.</p>
        <h3>Original Arabic Text</h3>
        <h4>Original Arabic Text</h4>
    </div>

The sequence in which p,h3,h4 etc. appears does matter, because I have to parse it to android textview.

What I can do is :

Document document = Jsoup.connect("page link here").get();

Elements pTag = document.select("div.pf-content");

But how should I proceed from here? Please help me.

What I tried is:

Elements elements = document.select("div.pf-content");

            for (Element element : elements) {
                Log.d("FullContent", "elements are: " + element);
                if (element.select("p").first() != null) {
                    Log.d("FullContent", "a p tag");
                    if (element.select("p").first().select("img").first() != null) {
                        Log.d("FullContent", "the tag "  + "has src");
                    }


                } else if (element.select("h1").first() != null) {
                    Log.d("FullContent", "a h1 tag");
                } else if (element.select("h2").first() != null) {
                    Log.d("FullContent", "a h2 tag");
                } else if (element.select("h3").first() != null) {
                    Log.d("FullContent", "a h3 tag");
                } else if (element.select("h4").first() != null) {
                    Log.d("FullContent", "a h4 tag");
                } else {
                    Log.d("FullContent", "other tag");
                }

            }

Upvotes: 0

Views: 3379

Answers (1)

GSala
GSala

Reputation: 976

Once you have the Elements that you found with Elements pTag = document.select("div.pf-content");, you can do the following:

Elements elements = pTag.first().children(); for (Element e : elements){ // Do something with each element }

Upvotes: 1

Related Questions