Reputation: 86687
Are there better ways to read an entire html file to a single string variable than:
String content = "";
try {
BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
String str;
while ((str = in.readLine()) != null) {
content +=str;
}
in.close();
} catch (IOException e) {
}
Upvotes: 38
Views: 144992
Reputation: 601
import org.apache.commons.io.IOUtils;
import java.io.IOException;
try {
var content = new String(IOUtils.toByteArray ( this.getClass().
getResource("/index.html")));
} catch (IOException e) {
e.printStackTrace();
}
//Java 10 Code mentioned above - assuming index.html is available inside resources folder.
Upvotes: 0
Reputation: 4764
I prefers using Guava :
import com.google.common.base.Charsets;
import com.google.common.io.Files;
File file = new File("/path/to/file", Charsets.UTF_8);
String content = Files.toString(file);
Upvotes: 4
Reputation: 98881
Here's a solution to retrieve the html of a webpage using only standard java libraries:
import java.io.*;
import java.net.*;
String urlToRead = "https://google.com";
URL url; // The URL to read
HttpURLConnection conn; // The actual connection to the web page
BufferedReader rd; // Used to read results from the web page
String line; // An individual line of the web page HTML
String result = ""; // A long string containing all the HTML
try {
url = new URL(urlToRead);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
result += line;
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(result);
Upvotes: 0
Reputation: 527
As Jean mentioned, using a StringBuilder
instead of +=
would be better. But if you're looking for something simpler, Guava, IOUtils, and Jsoup are all good options.
Example with Guava:
String content = Files.asCharSource(new File("/path/to/mypage.html"), StandardCharsets.UTF_8).read();
Example with IOUtils:
InputStream in = new URL("/path/to/mypage.html").openStream();
String content;
try {
content = IOUtils.toString(in, StandardCharsets.UTF_8);
} finally {
IOUtils.closeQuietly(in);
}
Example with Jsoup:
String content = Jsoup.parse(new File("/path/to/mypage.html"), "UTF-8").toString();
or
String content = Jsoup.parse(new File("/path/to/mypage.html"), "UTF-8").outerHtml();
NOTES:
Files.readLines()
andFiles.toString()
These are now deprecated as of Guava release version 22.0 (May 22, 2017).
Files.asCharSource()
should be used instead as seen in the example above. (version 22.0 release diffs)
IOUtils.toString(InputStream)
andCharsets.UTF_8
Deprecated as of Apache Commons-IO version 2.5 (May 6, 2016). IOUtils.toString
should now be passed the InputStream
and the Charset
as seen in the example above. Java 7's StandardCharsets
should be used instead of Charsets
as seen in the example above. (deprecated Charsets.UTF_8)
Upvotes: 6
Reputation: 49187
There's the IOUtils.toString(..)
utility from Apache Commons.
If you're using Guava
there's also Files.readLines(..)
and Files.toString(..)
.
Upvotes: 28
Reputation:
For string operations use StringBuilder or StringBuffer classes for accumulating string data blocks. Do not use +=
operations for string objects. String
class is immutable and you will produce a large amount of string objects upon runtime and it will affect on performance.
Use .append()
method of StringBuilder/StringBuffer class instance instead.
Upvotes: 3
Reputation: 53819
You should use a StringBuilder:
StringBuilder contentBuilder = new StringBuilder();
try {
BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
}
String content = contentBuilder.toString();
Upvotes: 29