Reputation: 11
I have been looking to set up a simple code to scratch data from the web. The result is a list of dataframes in a list. What I am trying to do is to add specific informations for each of the dataframes, in order to bind them afterwards.
Here is the code
page_numbers <- c(123, 124, 125, 126)
urls <- paste("http://www.abstimmungen.bl.ch/de/vote/detail/", page_numbers, sep = "")
Data <- lapply(urls, function(x){readHTMLTable(getURL(x),stringsAsFactors=F)})
Nothing let me distinguish the differents dataframes. I thought therefore to make a list of name as following
Title <- list("Bruderholz-Initiative", "Lehrpersonen-Initiative", "Abschaffung Amtszeitbeschränkung", "Aufgabenzuordnung BL-Gemeinden")
I want to add the same column variable to all the dataframe, called Title, and add the specific value for each of them with a loop for.
for( i in Data){
Data[[i]]$Titre <- rep(Titre[i],
nrow(as.data.frame(Data[[i]]))
)}
The result is an error because of an incorrect indice. Alternatively, I have tried this other piece of code
Data2 <- Map(transform , Data, Titres = Titre[i])
I really don't see how to correct my code to make it works, I can only guess that the structure of my list cause a problem. Any help is really welcome Thanks in advance !
Upvotes: 1
Views: 103
Reputation: 56054
To avoid the problem of adding Title later, why not add Title within the apply loop when we read the url one by one, then rbind, see:
library(XML)
library(RCurl)
page_numbers <- c(123, 124, 125, 126)
Title <- c("Bruderholz-Initiative", "Lehrpersonen-Initiative",
"Abschaffung Amtszeitbeschränkung", "Aufgabenzuordnung BL-Gemeinden")
Data <-
do.call(rbind,
lapply(seq(page_numbers),
function(x){
myURL <- paste("http://www.abstimmungen.bl.ch/de/vote/detail/", page_numbers[x], sep = "")
# above is returning a list, so take the first one...
dd <- readHTMLTable(getURL(myURL), stringsAsFactors = FALSE)[[1]]
dd$Title <- Title[x]
# return
dd
})
)
Upvotes: 1
Reputation: 42544
The OP's goal is to add specific informations for each of the dataframes, in order to bind them afterwards.
The sample data provided by the OP suggest that it's only one item, Title
, that should be added, presumably for later grouping. If this is the case, there is a simple solution at hand using rindlist()
from the data.table
package which "names" the rows while binding:
# remove one list level to get a list of data.frames
# (as already suggested by the OP)
Data1 <- unlist(Data, recursive = FALSE)
# name the list elements
Data1 <- setNames(Data1, Title)
str(Data1)
List of 4 $ Bruderholz-Initiative :'data.frame': 91 obs. of 9 variables: ..$ Bezirk : chr [1:91] "Bezirk Arlesheim" "Aesch" "Allschwil" "Arlesheim" ... ..$ Resultat : chr [1:91] "abgelehnt11680" "abgelehnt" "abgelehnt" "abgelehnt" ... ..$ Ja : chr [1:91] "15433" "840" "1473" "727" ... ..$ Nein : chr [1:91] "27159" "1606" "3513" "1982" ... ..$ Leer : chr [1:91] "864" "38" "121" "75" ... ..$ Ungültig: chr [1:91] "758" "18" "179" "59" ... ..$ Ja% : chr [1:91] "36.23" "34.34" "29.54" "26.84" ... ..$ Nein% : chr [1:91] "63.77" "65.66" "70.46" "73.16" ... ..$ Gemeldet : chr [1:91] "15 von 15" "ja" "ja" "ja" ... $ Lehrpersonen-Initiative :'data.frame': 91 obs. of 9 variables: [...] $ Abschaffung Amtszeitbeschränkung:'data.frame': 91 obs. of 9 variables: [...] $ Aufgabenzuordnung BL-Gemeinden :'data.frame': 91 obs. of 9 variables: [...]
library(data.table)
# combine all rows, thereby creating an id column Title containing
# the names of the list elements
DT <- rbindlist(Data1, idcol = "Title")
DT
Title Bezirk Resultat Ja Nein Leer Ungültig Ja% Nein% Gemeldet 1: Bruderholz-Initiative Bezirk Arlesheim abgelehnt11680 15433 27159 864 758 36.23 63.77 15 von 15 2: Bruderholz-Initiative Aesch abgelehnt 840 1606 38 18 34.34 65.66 ja 3: Bruderholz-Initiative Allschwil abgelehnt 1473 3513 121 179 29.54 70.46 ja 4: Bruderholz-Initiative Arlesheim abgelehnt 727 1982 75 59 26.84 73.16 ja 5: Bruderholz-Initiative Biel-Benken abgelehnt 565 575 23 20 49.56 50.44 ja --- 360: Aufgabenzuordnung BL-Gemeinden Niederdorf angenommen 298 85 15 4 77.81 22.19 ja 361: Aufgabenzuordnung BL-Gemeinden Oberdorf angenommen 416 119 27 4 77.76 22.24 ja 362: Aufgabenzuordnung BL-Gemeinden Reigoldswil angenommen 333 65 23 7 83.67 16.33 ja 363: Aufgabenzuordnung BL-Gemeinden Titterten angenommen 122 28 9 4 81.33 18.67 ja 364: Aufgabenzuordnung BL-Gemeinden Waldenburg angenommen 158 45 23 4 77.83 22.17 ja
For the sake of completeness, there are also other ways to add an id column to the single data rows before binding:
In the original, nested list:
Data0 <- lapply(seq_along(Data), function(.i) cbind(Data[[.i]][[1]], Title = Title[[.i]]))
str(Data0[1])
List of 1 $ :'data.frame': 91 obs. of 10 variables: ..$ Bezirk : chr [1:91] "Bezirk Arlesheim" "Aesch" "Allschwil" "Arlesheim" ... ..$ Resultat : chr [1:91] "abgelehnt11680" "abgelehnt" "abgelehnt" "abgelehnt" ... ..$ Ja : chr [1:91] "15433" "840" "1473" "727" ... ..$ Nein : chr [1:91] "27159" "1606" "3513" "1982" ... ..$ Leer : chr [1:91] "864" "38" "121" "75" ... ..$ Ungültig: chr [1:91] "758" "18" "179" "59" ... ..$ Ja% : chr [1:91] "36.23" "34.34" "29.54" "26.84" ... ..$ Nein% : chr [1:91] "63.77" "65.66" "70.46" "73.16" ... ..$ Gemeldet : chr [1:91] "15 von 15" "ja" "ja" "ja" ... ..$ Title : Factor w/ 1 level "Bruderholz-Initiative": 1 1 1 1 1 1 1 1 1 1 ...
or in the "flattened" list:
Data1 <- unlist(Data, recursive = FALSE)
Data2 <- lapply(seq_along(Data1), function(.i) cbind(Data1[[.i]], Title = Title[[.i]]))
str(Data2[1])
List of 1 $ :'data.frame': 91 obs. of 10 variables: ..$ Bezirk : chr [1:91] "Bezirk Arlesheim" "Aesch" "Allschwil" "Arlesheim" ... ..$ Resultat : chr [1:91] "abgelehnt11680" "abgelehnt" "abgelehnt" "abgelehnt" ... ..$ Ja : chr [1:91] "15433" "840" "1473" "727" ... ..$ Nein : chr [1:91] "27159" "1606" "3513" "1982" ... ..$ Leer : chr [1:91] "864" "38" "121" "75" ... ..$ Ungültig: chr [1:91] "758" "18" "179" "59" ... ..$ Ja% : chr [1:91] "36.23" "34.34" "29.54" "26.84" ... ..$ Nein% : chr [1:91] "63.77" "65.66" "70.46" "73.16" ... ..$ Gemeldet : chr [1:91] "15 von 15" "ja" "ja" "ja" ... ..$ Title : Factor w/ 1 level "Bruderholz-Initiative": 1 1 1 1 1 1 1 1 1 1 ...
No kind of for
loops is required to accomplish the task.
Please, note that cbind()
has turned Title
to factor by default. This can be turned off by including the parameter stringsAsFactors = FALSE
in the call to cbind()
.
Both approaches return a list of data.frames which can be directly combined row-wise by
do.call(rbind, Data0)
or
rbindlist(Data0)
Upvotes: 1