Idan Haim Shalom
Idan Haim Shalom

Reputation: 1

Connect and print html page that follow with thread

Hello everyone I got problem getting the full html file with java . i am using this function :

public static void secondUrl() {
        String expr = "<div//s+class=\"t_fsz\"[^>]*>" + "(.*)?"
                + "\r\n*</div>*";

        try {
            URL google = new URL(
                    "http://www.kr16.com/thread-90107-1-1.html");

            HttpURLConnection yc = (HttpURLConnection) google.openConnection();
            yc.setInstanceFollowRedirects(true);  //you still need to handle redirect manully.
            HttpURLConnection.setFollowRedirects(true);
            BufferedReader in = new BufferedReader(new InputStreamReader(
                    yc.getInputStream()));
            String inputLine = "";
            Pattern patt = Pattern.compile("<div//s+class=\"t_fsz\">",
                    Pattern.DOTALL | Pattern.UNIX_LINES);
            int counter = 1;
            while ((inputLine = in.readLine()) != null) {
                System.out.println(counter++ + inputLine);
                // Matcher m = patt.matcher(inputLine);
                // while (m.find()) {
                //
                // String extractedText = m.group();
                //
                // // extractedText = extractedText.replaceAll("<.*?>", "");
                // // extractedText = extractedText.replaceAll("&quot;", "\"");
                // System.out.println(counter++ + ". " + extractedText);
                // System.out.println();
                //
                // }

            }
            in.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

do not look on the regex. I am trying to connect "http://www.kr16.com/thread-90107-1-1.html" with no success when i print the source page i got the wrong one . cant find any solution . I know that the problem is where the thread-90107-1-1.html and i need to tell the connection that i have thread but i dont know how. please help me and thank you.

Upvotes: 0

Views: 52

Answers (1)

Idan Haim Shalom
Idan Haim Shalom

Reputation: 1

problem solved i just needed to add in the BufferedReader that i have different charset

BufferedReader in = new BufferedReader(new InputStreamReader(
                    yc.getInputStream(),"gbk"));

Upvotes: 0

Related Questions