rbrisuda
rbrisuda

Reputation: 1005

Parse HTML Web Page

I parse data from this web site using JSoup:

http://www.skore.com/en/soccer/england/premier-league/results/all/

I get the name of team and result, and I also need to the get name of scorer (it is under result).

I am trying it but having trouble as it is not in HTML.

Is it possible? If yes how?

Upvotes: 1

Views: 479

Answers (1)

acdcjunior
acdcjunior

Reputation: 135872

The scorers infos are acquired after an AJAX request (that occurs when you click the score link). You'll have to make such request and parse the result.

For instnace, take the first game (Manchester United 1x2 Manchester City), its tag is:

<a data-y="r1-1229442" data-v="england-premierleague-manchesterunited-manchestercity-13april2013" style="cursor: pointer;">1 - 2</a>

Take data-y, remove the leading r and make a get request to:

http://www.skore.com/en/scores/soccer/id/<DATA-Y_HERE>?fmt=html

Such as: http://www.skore.com/en/scores/soccer/id/1-1229442?fmt=html. And then parse the result.

Full working example:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class ParseScore {

    public static void main(String[] args) throws Exception {
        Document doc = Jsoup.connect("http://www.skore.com/en/soccer/england/premier-league/results/all/").get();
        System.out.println("title: " + doc.title());

        Elements dls = doc.select("dl");

        for (Element link : dls) {
            String id = link.attr("id");

            /* check if then it is a game <dl> */
            if (id != null && id.length() > 3 && "rid".equals(id.substring(0, 3))) {

                System.out.println("Game: " + link.text());

                String idNoRID = id.replace("rid", "");
                // String idNoRID = "1-1229442";
                String scoreURL = "http://www.skore.com/en/scores/soccer/id/" + idNoRID + "?fmt=html";
                Document docScore = Jsoup.connect(scoreURL).get();

                Elements trs = docScore.select("tr");
                for (Element tr : trs) {
                    Elements spanGoal = tr.select("span.goal");
                    /* only enter if there is a goal */
                    if (spanGoal.size() > 0) {
                        Elements score = tr.select("td.score");
                        String playerName = spanGoal.get(0).text();
                        String currentScore = score.get(0).text();
                        System.out.println("\t\tGOAL: " + currentScore + ": " + playerName);
                    }

                    Elements spanGoalPenalty = tr.select("span.goalpenalty");
                    /* only enter if there is a goal */
                    if (spanGoalPenalty.size() > 0) {
                        Elements score = tr.select("td.score");
                        String playerName = spanGoalPenalty.get(0).text();
                        String currentScore = score.get(0).text();
                        System.out.println("\t\tGOAL: " + currentScore + ": " + playerName + " (penalty)");
                    }

                    Elements spanGoalOwn = tr.select("span.goalown");
                    /* only enter if there is a goal */
                    if (spanGoalOwn.size() > 0) {
                        Elements score = tr.select("td.score");
                        String playerName = spanGoalOwn.get(0).text();
                        String currentScore = score.get(0).text();
                        System.out.println("\t\tGOAL: " + currentScore + ": " + playerName + " (own goal)");
                    }
                }
            }
        }
    }
}

Output:

title: Skore : Premier League, England - Soccer Results (All)
Game: F T Arsenal 3 - 1 Norwich
        GOAL: 0 - 1: Michael Turner
        GOAL: 1 - 1: Mikel Arteta (penalty)
        GOAL: 2 - 1: Sébastien Bassong (own goal)
        GOAL: 3 - 1: Lukas Podolski
Game: F T Aston Villa 1 - 1 Fulham
        GOAL: 1 - 0: Charles N´Zogbia
        GOAL: 1 - 1: Fabian Delph (own goal)
Game: F T Everton 2 - 0 Queens Park Rangers
        GOAL: 1 - 0: Darron Gibson
        GOAL: 2 - 0: Victor Anichebe
Game: F T Reading 0 - 0 Liverpool
Game: F T Southampton 1 - 1 West Ham
        GOAL: 1 - 0: Gaston Ramirez
        GOAL: 1 - 1: Andrew Carroll
Game: F T Manchester United 1 - 2 Manchester City
        GOAL: 0 - 1: James Milner
...

JSoup 1.7.1 was used. If using maven, add this to your pom.xml:

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.7.1</version>
</dependency>

Upvotes: 3

Related Questions