Reputation: 11
I'm using latest jsoup (1.13.1) in latest Eclipse IDE for Java Developers (includes Incubating components) Version: 2020-09 (4.17.0) Build id: 20200910-1200.
I'm trying to parse a very specific website, but with no success. After I execute these lines: doc = Jsoup.connect("http://pokehb.pw/%D7%A2%D7%95%D7%A0%D7%94/21/%D7%A4%D7%A8%D7%A7/43").get(); doc.select("title").forEach(System.out::println);
Nothing gets printed. It's not just the , any element or property of the page is not available.
Yes, the URL is weird, but this is the one I need, I can browse it fine in Chrome. I also know this is now due to the Hebrew in the website, since other Hebrew sites works ok.
For example, using this URL seems fine: https://context.reverso.net/translation/hebrew-english/%D7%9C%D7%9B%D7%AA%D7%95%D7%91%D7%AA+url
Any hint on what can be done?
Upvotes: 0
Views: 78
Reputation: 11
What I ended up doing is using this command: doc = Jsoup.parse(driver.getPageSource());
Which brought all of the page's source into the doc. From there it was a simple use of getElementsByClass and getElementsByTag.
Hope this helps someone, and thanks Rob for trying to answer.
Upvotes: 0
Reputation: 2874
What I can tell you is there's a "laravel_session" in the cookies. This suggests you'll need a more capable technology than JSoup. Try HtmlUnit instead and it might work.
Upvotes: 0