Reputation: 87
When trying parse html page of website it crashes with the error:
java.io.IOException:Mark has been invalidated.
Part of my code:
String xml = xxxxxx;
try {
Document document = Jsoup.connect(xml).maxBodySize(1024*1024*10)
.timeout(0).ignoreContentType(true)
.parser(Parser.xmlParser()).get();
Elements elements = document.body().select("td.hotv_text:eq(0)");
for (Element element : elements) {
Element element1 = element.select("a[href].hotv_text").first();
hashMap.put(element.text(), element1.attr("abs:href"));
}
} catch (HttpStatusException ex) {
Log.i("GyWueInetSvc", "Exception while JSoup connect:" + xml +" cause:"+ ex.getMessage());
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("Socket timeout: " + e.getMessage(), e);
}
The size of website which I want parse is about 2MB. And when I debug code I see that when in java package ConstrainableInputStream.java
method:
public void reset() throws IOException {
super.reset();remaining = maxSize - markpos;
}
and returns markpos= -1
then goes to the exception.
How can I solve that problem?
Upvotes: 1
Views: 3602
Reputation: 13545
to add on to @ulong's answer, reguarding the use of bufferUp()
this is recommended in the documentation within the jsoup codes itself if you need to parse the document several times. BufferUp is called before parse, so that the InputStream will not be drained, resulting in an invalid mark error (IOException)
/**
* Read and parse the body of the response as a Document. If you intend to parse the same response multiple
* times, you should {@link #bufferUp()} first.
* @return a parsed Document
* @throws IOException on error
*/
Document parse() throws IOException;
and reguarding bufferUp()
/**
* Read the body of the response into a local buffer, so that {@link #parse()} may be called repeatedly on the
* same connection response (otherwise, once the response is read, its InputStream will have been drained and
* may not be re-read). Calling {@link #body() } or {@link #bodyAsBytes()} has the same effect.
* @return this response, for chaining
* @throws UncheckedIOException if an IO exception occurs during buffering.
*/
Response bufferUp();
Upvotes: 0
Reputation: 521
I've got the same exception when upgrading to 1.12.2 from 1.11.3 Try downgrade your dependecies
Upvotes: 1
Reputation: 51
This is helped me:
GET: .execute().bufferUp().parse();
POST: .method(Connection.Method.POST).execute().bufferUp().parse();
Upvotes: 5
Reputation: 513
Use ~.execute().parse();
instead of ~.get();
to get the document and remove the parser thus your code becomes;
Document document = Jsoup.connect(xml).maxBodySize(1024*1024*10)
.timeout(0).ignoreContentType(true)
.execute().parse();
this is a temporary fix as we await the new version which will fix the bug
Upvotes: -1
Reputation: 87
I found solution of the problem. Problem was in buffer overloading. Solved using below code:
BufferedReader br = null;
try{
connection = new URL(xml).openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
content = content +line;
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Document document = Jsoup.parse(content);
Upvotes: 2