Reputation: 3353
here is some part of my HTML
<p>hello world </p>
<p><img class=\"aligncenter size-full wp-image-3197\" src=\"data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=\" data-lazy-src=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg\" alt=\"harmony-02\" width=\"800\" height=\"450\" data-lazy-srcset=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg 800w, http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02-300x169.jpg 300w\" sizes=\"(max-width: 800px) 100vw, 800px\" /><noscript><img class=\"aligncenter size-full wp-image-3197\" src=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg\" alt=\"harmony-02\" width=\"800\" height=\"450\" srcset=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg 800w, http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02-300x169.jpg 300w\" sizes=\"(max-width: 800px) 100vw, 800px\" /></noscript></p
<p>goodbye world</p>
As you see have 3 <p>
tag inside HTML. But how can I define in jsoup to take just normal <p>
tag like hello world and goodbye world and ignore that <p>
tag with img class?
here is my code so far :
public class MainActivity extends AppCompatActivity {
public WebView webView;
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main_page);
webView=(WebView)findViewById(R.id.webi);
new AsyncTask<Void, Void, String>() {
@Override
protected String doInBackground(Void... voids) {
String html = "";
try {
Document document = Jsoup.connect("http://memaraneha.ir/%db%8c%da%a9%d9%be%d8%a7%d8%b1%da%86%da%af%db%8c-%d9%87%d9%85%d8%a7%d9%87%d9%86%da%af%db%8c-%d8%b7%d8%b1%d8%a7%d8%ad%db%8c-%d8%af%d8%a7%d8%ae%d9%84%db%8c/")
.timeout(20000).get();
Elements elements=document.select("div.base-box:nth-child(2)").select("p");
html = elements.toString();
} catch (IOException e) {
e.printStackTrace();
}
return html;
}
@Override
protected void onPostExecute(String html) {
String mime = "text/html";
String encoding = "utf-8";
webView.loadDataWithBaseURL(null,html, mime, encoding,null);
}
}.execute();
}
}
Upvotes: 1
Views: 808
Reputation: 6171
You can avoid the loop and use the following:
Elements e = doc.select("p:not(:has(img))");
Upvotes: 1
Reputation: 3527
You can try with something like this.
Selects all <p>
tags that don't have any <img>
tag inside
Document document = Jsoup.connect().get();
Elements elements = new Elements();
for (Element e : document.select("p")) {
if (e.select("img").isEmpty()) {
elements.add(e);
}
}
Upvotes: 0