Petr
Petr

Reputation: 125

r rvest webscraping hltv

Yes, that's just another "how-to-scrape" question. Sorry for that, but I've read the previous answers and the manual for rvest as well.

I'm doing web-scraping for my homework (so I do not plan to use the data for any commercial issue). The idea is to show that average skill of team affect individual skill. I'm trying to use CS:GO data from HLTV.org for it.

The information is available at http://www.hltv.org/?pageid=173&playerid=9216

I need two tables: Keystats (data only) and Teammates (data and URLs). I try to use CSS selectors generated by SelectorGadget and I also tryed to analyze the source code of webpage. I've failed. I'm doing the following:

library(rvest)
library(dplyr)

url <- 'http://www.hltv.org/?pageid=173&playerid=9216'
info <- html_session(url) %>% read_html()
info %>% html_node('.covSmallHeadline') %>% html_text()

Can you please tell me that is the right CSS selector?

Upvotes: 1

Views: 1112

Answers (1)

alistaire
alistaire

Reputation: 43354

If you look at the source, those tables aren't HTML tables, but just piles of divs with inconsistent nesting and inline CSS for alignment. Thus, it's easiest to just grab all the text and fix the strings afterwards, as the data is either all numeric or not at all.

library(rvest)
library(tidyverse)

h <- 'http://www.hltv.org/?pageid=173&playerid=9216' %>% read_html()

h %>% html_nodes('.covGroupBoxContent') %>% .[-1] %>% 
    html_text(trim = TRUE) %>% 
    strsplit('\\s*\\n\\s*') %>% 
    setNames(map_chr(., ~.x[1])) %>% map(~.x[-1]) %>%
    map(~data_frame(variable = gsub('[.0-9]+', '', .x), 
                    value = parse_number(.x)))

#> $`Key stats`
#> # A tibble: 9 × 2
#>                   variable    value
#>                      <chr>    <dbl>
#> 1              Total kills  9199.00
#> 2              Headshot %%    46.00
#> 3             Total deaths  6910.00
#> 4                K/D Ratio     1.33
#> 5              Maps played   438.00
#> 6            Rounds played 11242.00
#> 7  Average kills per round     0.82
#> 8 Average deaths per round     0.61
#> 9               Rating (?)     1.21
#> 
#> $TeammatesRating
#> # A tibble: 4 × 2
#>                    variable value
#>                       <chr> <dbl>
#> 1   Gabriel 'FalleN' Toledo  1.11
#> 2  Fernando 'fer' Alvarenga  1.11
#> 3 Joao 'felps' Vasconcellos  1.09
#> 4   Epitacio 'TACO' de Melo  0.98

Upvotes: 3

Related Questions