Bala
Bala

Reputation: 1

How to extract a hyper link which satisfies a condition in R

I have extract the page source information through R

download.file("http://stats.espncricinfo.com/ci/engine/records/team/match_results_year.html?class=2;id=6;type=team",
              "dataDictionary.html")
docHtml = htmlTreeParse("dataDictionary.html", useInternal = TRUE) # to Download the page source 
links <- xpathSApply(docHtml,path = "//a", xmlGetAttr, "href")

now I need to extract data which has something like "/ci/engine/records/team/match_results.html?class=2;id= *" . Here * in the sense whatever satisfies this condition those data.s has to be stored in another variable. Any help?

Upvotes: 0

Views: 25

Answers (1)

G5W
G5W

Reputation: 37661

All of the links you are interested in can be detected with grep

GoodLinks = grep("/ci/engine/records/team/match_results.html\\?class=2;id", links)

If you only want the id field, you can process those links with sub

sub(".*id=(\\d+).*", "\\1", links[GoodLinks])
[1] "1974" "1975" "1976" "1978" "1979" "1980" "1981" "1982" "1983" "1984" "1985" "1986" "1987" "1988" "1989" "1990"
[17] "1991" "1992" "1993" "1994" "1995" "1996" "1997" "1998" "1999" "2000" "2001" "2002" "2003" "2004" "2005" "2006"
[33] "2007" "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" "2016" "2017"

Upvotes: 1

Related Questions