Erfan
Erfan

Reputation: 3353

How can I get specific tag with jsoup android

here is some part of my HTML

<p>hello world </p>
<p><img class=\"aligncenter size-full wp-image-3197\" src=\"data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=\" data-lazy-src=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg\" alt=\"harmony-02\" width=\"800\" height=\"450\" data-lazy-srcset=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg 800w, http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02-300x169.jpg 300w\" sizes=\"(max-width: 800px) 100vw, 800px\" /><noscript><img class=\"aligncenter size-full wp-image-3197\" src=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg\" alt=\"harmony-02\" width=\"800\" height=\"450\" srcset=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg 800w, http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02-300x169.jpg 300w\" sizes=\"(max-width: 800px) 100vw, 800px\" /></noscript></p
<p>goodbye world</p>

As you see have 3 <p> tag inside HTML. But how can I define in jsoup to take just normal <p> tag like hello world and goodbye world and ignore that <p> tag with img class?

here is my code so far :

public class MainActivity extends AppCompatActivity {

   public WebView webView;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main_page);
        webView=(WebView)findViewById(R.id.webi);


        new AsyncTask<Void, Void, String>() {
            @Override
            protected String doInBackground(Void... voids) {
                String html = "";
                try {
                    Document document = Jsoup.connect("http://memaraneha.ir/%db%8c%da%a9%d9%be%d8%a7%d8%b1%da%86%da%af%db%8c-%d9%87%d9%85%d8%a7%d9%87%d9%86%da%af%db%8c-%d8%b7%d8%b1%d8%a7%d8%ad%db%8c-%d8%af%d8%a7%d8%ae%d9%84%db%8c/")
                            .timeout(20000).get();

                    Elements elements=document.select("div.base-box:nth-child(2)").select("p");
                    html = elements.toString();

                } catch (IOException e) {
                    e.printStackTrace();
                }
                return html;
            }
            @Override
            protected void onPostExecute(String html) {

                String mime = "text/html";
                String encoding = "utf-8";

                webView.loadDataWithBaseURL(null,html, mime, encoding,null);
            }
        }.execute();

    }

}

Upvotes: 1

Views: 808

Answers (2)

TDG
TDG

Reputation: 6171

You can avoid the loop and use the following:

Elements e = doc.select("p:not(:has(img))");

Upvotes: 1

Manza
Manza

Reputation: 3527

You can try with something like this.

Selects all <p> tags that don't have any <img> tag inside

    Document document = Jsoup.connect().get();
    Elements elements = new Elements();
    for (Element e : document.select("p")) {
        if (e.select("img").isEmpty()) {
            elements.add(e);
        }
    }

Upvotes: 0

Related Questions