Reputation: 95
I'm trying to pull data from the ESPN box scores, and one of the html files has:
<td style="text-align:left" nowrap><a href="http://espn.go.com/nba/player/_/id/2754/channing-frye">Channing Frye</a>, PF</td>
and I'm only interested in grabbing the name (Channing Frye) and the position (PF)
Right now, I've been using Pattern.quote(start) + "(.*?)" + Pattern.quote(end)
to grab text in between start
and end
, but I'm not sure how I'm supposed to grab text that starts with pattern .../http://espn.go.com/nba/player/_/id/
and then can contain (any integer)/anyfirst-anylast">
then grab the name I need (Channing Frye), then </a>,
and then grab the position I need (PF) and ends with pattern </td>
Thanks!
Upvotes: 0
Views: 82
Reputation: 337
You can use :
String lString = "<td style=\"text-align:left\" nowrap><a href=\"http://espn.go.com/nba/player/_/id/2754/channing-frye\">Channing Frye</a>, PF</td>";
Pattern lPattern = Pattern.compile("<td.+><a.+id/\\d+/.+\\-.+>(.+)</a>, (.+)</td>");
Matcher lMatcher = lPattern.matcher(lString);
while(lMatcher.find()) {
System.out.println(lMatcher.group(1));
System.out.println(lMatcher.group(2));
}
This will give you :
Channing Frye
PF
Upvotes: 0
Reputation: 87
Here is one regex:
\s is used for space
String str = "<td style=\"text-align:left\" nowrap><a href=\"http://espn.go.com/nba/player/_/id/2754/channing-frye\">Channing Frye</a>, PF</td>";
Pattern pattern = Pattern.compile("<td.+>.*<a.+>(.+)</a>[\\s,]+(.+)</td>");
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Upvotes: 1
Reputation: 47282
You could use this pattern:
\\/nba\\/player\\/_\\/.*\\\">(.*)<.+>,\\s(.*)<
This will match any link in the html that contains `/nba/player/
String re = "\\/nba\\/player\\/_\\/.*\\">(.*)<.+>,\\s(.*)<";
String str = "<td style=\"text-align:left\" nowrap><a href=\"http://espn.go.com/nba/player/_/id/2754/channing-frye\">Channing Frye</a>, PF</td>";
Pattern p = Pattern.compile(re, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
example: http://regex101.com/r/hA3uV0
Upvotes: 1
Reputation: 1282
Here is the pattern:
http://espn.go.com/nba/player/_/id/(\d+)/([\w-]+)">(.*?)</a>,\s*(\w+)</td>
You can use this tool - http://www.regexplanet.com/advanced/java/index.html for verifying regular expressions.
Upvotes: 2