Reputation: 1700
I'm struggling to get the statistics table on a website in a dataframe to do analysis on it. The table an be found here: http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/
My code so far:
library(XML)
url <- "http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/"
doc <- htmlParse(url)
xpathSApply(doc, "//tr[@*]/td/child::node()", xmlValue)
But this returns the data in an unworkable form. What is the correct xpathSApply code?
Upvotes: 1
Views: 438
Reputation: 269491
You don't need xpathSapply
. This one-liner can do it given the url:
readHTMLTable(url, header = "")[[1]]
giving:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
1 1 K. Vermeer 28 K 856 10 10 0 1 24 0 0 0 0 0
2 22 J. Cillessen 25 K 2204 25 24 1 0 8 0 0 0 0 0
3 30 M. van der Hart 20 K 0 0 0 0 0 2 0 0 0 0 0
4 2 R. van Rhijn 23 V 2786 32 31 1 1 1 2 3 6 0 1
5 3 T. Alderweireld 25 V 360 4 4 0 0 0 0 0 0 0 0
6 4 N. Moisander 28 V 1985 23 22 1 0 3 1 2 0 0 0
7 6 M. van der Hoorn 21 V 166 3 2 1 1 21 0 0 0 0 0
8 12 J. Veltman 22 V 2158 25 24 1 1 2 2 2 2 0 0
9 15 N. Boilesen 22 V 1445 20 17 3 6 6 1 2 3 0 0
10 17 D. Blind 24 V 2531 29 29 0 5 3 1 1 4 0 0
11 24 S. Denswil 21 V 1350 17 15 2 1 14 1 0 1 0 0
12 27 R. Ligeon 22 V 350 5 4 1 3 8 0 1 0 0 0
13 42 J. Riedewald 17 V 222 5 3 2 3 10 2 0 1 0 0
14 44 K. Tete 18 V 0 0 0 0 0 1 0 0 0 0 0
15 5 C. Poulsen 34 M 1523 29 14 15 3 20 1 3 2 0 0
16 8 L. Duarte 23 M 655 14 6 8 2 14 3 0 1 0 0
17 8 C. Eriksen 22 M 360 4 4 0 0 0 2 3 1 0 0
18 10 S. de Jong 25 M 1257 19 16 3 8 3 7 1 1 0 0
19 18 D. Klaassen 21 M 2102 26 23 3 2 5 10 3 1 0 0
20 20 L. Schöne 28 M 2149 29 25 4 6 6 9 8 1 0 0
21 25 T. Serero 24 M 2276 29 25 4 6 6 3 3 3 0 0
22 34 L. de Sa 21 M 512 12 5 7 5 12 1 1 1 0 0
23 7 V. Fischer 20 A 1636 24 19 5 6 6 3 2 1 0 0
24 9 K. Sigþórsson 24 A 1928 30 20 10 16 11 10 2 0 0 0
25 11 Bojan 23 A 1357 24 17 7 12 11 4 3 2 0 0
26 16 L. Andersen 19 A 405 9 4 5 3 14 0 0 0 0 0
27 19 T. Sana 24 A 223 4 2 2 1 7 0 0 0 0 0
28 23 D. Hoesen 23 A 450 14 4 10 2 15 2 1 0 0 0
29 43 R. Kishna 19 A 389 8 5 3 5 5 1 2 0 0 0
Upvotes: 1
Reputation: 30425
The table with the data has id='page_team_1_block_team_squad_3-table'
you can use this in an xpath. An xpath
"//table[@id='page_team_1_block_team_squad_3-table']/tbody"
will find the table with that id and return the table body. You can then use readHTMLTable
with argument header = FALSE
to return the data
library(XML)
url <- "http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/"
doc <- htmlParse(url)
res <- readHTMLTable(doc["//table[@id='page_team_1_block_team_squad_3-table']/tbody"][[1]], header = FALSE)
head(res)
> head(res)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 1 K. Vermeer 28 K 856 10 10 0 1 24 0 0 0 0
2 22 J. Cillessen 25 K 2204 25 24 1 0 8 0 0 0 0
3 30 M. van der Hart 20 K 0 0 0 0 0 2 0 0 0 0
4 2 R. van Rhijn 23 V 2786 32 31 1 1 1 2 3 6 0
5 3 T. Alderweireld 25 V 360 4 4 0 0 0 0 0 0 0
6 4 N. Moisander 28 V 1985 23 22 1 0 3 1 2 0 0
V17
1 0
2 0
3 0
4 1
5 0
6 0
Upvotes: 2