user9941626
user9941626

Reputation: 33

Error when trying to convert a list to dataframe using ldply (Error in (function (..., row.names = NULL, :arguments imply differing number of rows: )

I'm trying to scrape standard stats of players playing for soccer teams using RStudio. I'm able to extract the information into lists but not able to visualize them as data frames, it gives me this error (Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 33, 27, 24, 35, 5, 4, 54, 38, 18, 2, 1) I'm quite a noob in R and I can't think of a way to solve it, here is the code that I am using, and the page from which I am trying to extract the data, any help is very welcomed!!!

https://fbref.com/en/squads/2b390eca/2016-2017/Athletic-Bilbao

install.packages('rvest')
install.packages('plyr')
install.packages('dplyr')
library(rvest)
library(plyr)
library(dplyr)

years = c(2017:2018)
urls = list()
for (i in 1:length(years)) {
  url = paste0('https://fbref.com/en/squads/2b390eca/',years[i],'-',years[i+1],'/Athletic-Bilbao')
  urls[[i]] = url #https://fbref.com/en/squads/d5348c80/',years1[i],'-',years2[i+1],'/AEK-Athens
}


tbl = list()
years = 2017
j = 1
for (j in seq_along(urls)) {
  tbl[[j]] = urls[[j]] %>%
    read_html() %>%
    html_nodes("table") %>%
    html_table()
  tbl[[j]]$Year = years
  j = j+1
  years = years+1
}

Data = ldply(tbl,data.frame)

Upvotes: 3

Views: 583

Answers (1)

QHarr
QHarr

Reputation: 84465

I see two fixes that are needed.

Your second url is wrong. You want, I think, years[i] + 1 i.e. move + 1 outside of indexing. You then get 2017-2018 and 2018-19.

Secondly, there are numerous tables with varying numbers of rows and columns and you are trying to join them all when you only want the first (standard). If you only wanted the first table then use html_node rather than html_nodes i.e. html_node("table").

I am also not sure if the year column is set up to work the way you intend as you will currently get 2019 and 2020. I've changed so you get 2017 and 2018. You don't need to increment j btw.

library(rvest)
library(plyr)
library(dplyr)

years = c(2017:2018)
urls = list()

for (i in 1:length(years)) {
  url = paste0('https://fbref.com/en/squads/2b390eca/',years[i],'-',years[i] + 1,'/Athletic-Bilbao')
  urls[[i]] = url 
}

tbl = list()

for (j in seq_along(urls)) {
  tbl[[j]] <- urls[[j]] %>%
              read_html() %>%
              html_node("table") %>%
              html_table()
  tbl[[j]]$Year = years[j]
}

data = ldply(tbl,data.frame)

Upvotes: 1

Related Questions