Reputation: 1
Hello everyone I got problem getting the full html file with java . i am using this function :
public static void secondUrl() {
String expr = "<div//s+class=\"t_fsz\"[^>]*>" + "(.*)?"
+ "\r\n*</div>*";
try {
URL google = new URL(
"http://www.kr16.com/thread-90107-1-1.html");
HttpURLConnection yc = (HttpURLConnection) google.openConnection();
yc.setInstanceFollowRedirects(true); //you still need to handle redirect manully.
HttpURLConnection.setFollowRedirects(true);
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine = "";
Pattern patt = Pattern.compile("<div//s+class=\"t_fsz\">",
Pattern.DOTALL | Pattern.UNIX_LINES);
int counter = 1;
while ((inputLine = in.readLine()) != null) {
System.out.println(counter++ + inputLine);
// Matcher m = patt.matcher(inputLine);
// while (m.find()) {
//
// String extractedText = m.group();
//
// // extractedText = extractedText.replaceAll("<.*?>", "");
// // extractedText = extractedText.replaceAll(""", "\"");
// System.out.println(counter++ + ". " + extractedText);
// System.out.println();
//
// }
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
do not look on the regex. I am trying to connect "http://www.kr16.com/thread-90107-1-1.html" with no success when i print the source page i got the wrong one . cant find any solution . I know that the problem is where the thread-90107-1-1.html and i need to tell the connection that i have thread but i dont know how. please help me and thank you.
Upvotes: 0
Views: 52
Reputation: 1
problem solved i just needed to add in the BufferedReader that i have different charset
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream(),"gbk"));
Upvotes: 0