ad_s
ad_s

Reputation: 1700

Correct syntax for xpathSApply in R

I'm struggling to get the statistics table on a website in a dataframe to do analysis on it. The table an be found here: http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/

My code so far:

library(XML)
url <- "http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/"
doc <- htmlParse(url)
xpathSApply(doc, "//tr[@*]/td/child::node()", xmlValue)

But this returns the data in an unworkable form. What is the correct xpathSApply code?

Upvotes: 1

Views: 438

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269491

You don't need xpathSapply. This one-liner can do it given the url:

readHTMLTable(url, header = "")[[1]]

giving:

   V1 V2               V3 V4 V5 V6   V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
1   1          K. Vermeer    28  K  856 10 10   0   1  24   0   0   0   0   0
2  22        J. Cillessen    25  K 2204 25 24   1   0   8   0   0   0   0   0
3  30     M. van der Hart    20  K    0  0  0   0   0   2   0   0   0   0   0
4   2        R. van Rhijn    23  V 2786 32 31   1   1   1   2   3   6   0   1
5   3     T. Alderweireld    25  V  360  4  4   0   0   0   0   0   0   0   0
6   4        N. Moisander    28  V 1985 23 22   1   0   3   1   2   0   0   0
7   6    M. van der Hoorn    21  V  166  3  2   1   1  21   0   0   0   0   0
8  12          J. Veltman    22  V 2158 25 24   1   1   2   2   2   2   0   0
9  15         N. Boilesen    22  V 1445 20 17   3   6   6   1   2   3   0   0
10 17            D. Blind    24  V 2531 29 29   0   5   3   1   1   4   0   0
11 24          S. Denswil    21  V 1350 17 15   2   1  14   1   0   1   0   0
12 27           R. Ligeon    22  V  350  5  4   1   3   8   0   1   0   0   0
13 42        J. Riedewald    17  V  222  5  3   2   3  10   2   0   1   0   0
14 44             K. Tete    18  V    0  0  0   0   0   1   0   0   0   0   0
15  5          C. Poulsen    34  M 1523 29 14  15   3  20   1   3   2   0   0
16  8           L. Duarte    23  M  655 14  6   8   2  14   3   0   1   0   0
17  8          C. Eriksen    22  M  360  4  4   0   0   0   2   3   1   0   0
18 10          S. de Jong    25  M 1257 19 16   3   8   3   7   1   1   0   0
19 18         D. Klaassen    21  M 2102 26 23   3   2   5  10   3   1   0   0
20 20           L. Schöne    28  M 2149 29 25   4   6   6   9   8   1   0   0
21 25           T. Serero    24  M 2276 29 25   4   6   6   3   3   3   0   0
22 34            L. de Sa    21  M  512 12  5   7   5  12   1   1   1   0   0
23  7          V. Fischer    20  A 1636 24 19   5   6   6   3   2   1   0   0
24  9       K. Sigþórsson    24  A 1928 30 20  10  16  11  10   2   0   0   0
25 11               Bojan    23  A 1357 24 17   7  12  11   4   3   2   0   0
26 16         L. Andersen    19  A  405  9  4   5   3  14   0   0   0   0   0
27 19             T. Sana    24  A  223  4  2   2   1   7   0   0   0   0   0
28 23           D. Hoesen    23  A  450 14  4  10   2  15   2   1   0   0   0
29 43           R. Kishna    19  A  389  8  5   3   5   5   1   2   0   0   0

Upvotes: 1

jdharrison
jdharrison

Reputation: 30425

The table with the data has id='page_team_1_block_team_squad_3-table' you can use this in an xpath. An xpath "//table[@id='page_team_1_block_team_squad_3-table']/tbody" will find the table with that id and return the table body. You can then use readHTMLTable with argument header = FALSE to return the data

library(XML)
url <- "http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/"
doc <- htmlParse(url)
res <- readHTMLTable(doc["//table[@id='page_team_1_block_team_squad_3-table']/tbody"][[1]], header = FALSE)
head(res)
> head(res)
V1 V2              V3 V4 V5 V6   V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1  1         K. Vermeer    28  K  856 10 10   0   1  24   0   0   0   0
2 22       J. Cillessen    25  K 2204 25 24   1   0   8   0   0   0   0
3 30    M. van der Hart    20  K    0  0  0   0   0   2   0   0   0   0
4  2       R. van Rhijn    23  V 2786 32 31   1   1   1   2   3   6   0
5  3    T. Alderweireld    25  V  360  4  4   0   0   0   0   0   0   0
6  4       N. Moisander    28  V 1985 23 22   1   0   3   1   2   0   0
V17
1   0
2   0
3   0
4   1
5   0
6   0

Upvotes: 2

Related Questions