Loris Junod
Loris Junod

Reputation: 11

How to add a column with specific values in a list of data frames

I have been looking to set up a simple code to scratch data from the web. The result is a list of dataframes in a list. What I am trying to do is to add specific informations for each of the dataframes, in order to bind them afterwards.

Here is the code

page_numbers <- c(123, 124, 125, 126)

urls <- paste("http://www.abstimmungen.bl.ch/de/vote/detail/", page_numbers, sep = "")

Data <- lapply(urls, function(x){readHTMLTable(getURL(x),stringsAsFactors=F)})

Nothing let me distinguish the differents dataframes. I thought therefore to make a list of name as following

Title <- list("Bruderholz-Initiative", "Lehrpersonen-Initiative", "Abschaffung Amtszeitbeschränkung", "Aufgabenzuordnung BL-Gemeinden")

I want to add the same column variable to all the dataframe, called Title, and add the specific value for each of them with a loop for.

for( i in Data){
  Data[[i]]$Titre <- rep(Titre[i],
                         nrow(as.data.frame(Data[[i]]))
                         )}

The result is an error because of an incorrect indice. Alternatively, I have tried this other piece of code

Data2 <- Map(transform , Data, Titres = Titre[i])

I really don't see how to correct my code to make it works, I can only guess that the structure of my list cause a problem. Any help is really welcome Thanks in advance !

Upvotes: 1

Views: 103

Answers (2)

zx8754
zx8754

Reputation: 56054

To avoid the problem of adding Title later, why not add Title within the apply loop when we read the url one by one, then rbind, see:

library(XML)
library(RCurl)

page_numbers <- c(123, 124, 125, 126)
Title <- c("Bruderholz-Initiative", "Lehrpersonen-Initiative",
           "Abschaffung Amtszeitbeschränkung", "Aufgabenzuordnung BL-Gemeinden")

Data <- 
  do.call(rbind,
          lapply(seq(page_numbers),
                 function(x){
                   myURL <- paste("http://www.abstimmungen.bl.ch/de/vote/detail/", page_numbers[x], sep = "")
                   # above is returning a list, so take the first one...
                   dd <- readHTMLTable(getURL(myURL), stringsAsFactors = FALSE)[[1]]
                   dd$Title <- Title[x]
                   # return
                   dd
                 })
  )

Upvotes: 1

Uwe
Uwe

Reputation: 42544

The OP's goal is to add specific informations for each of the dataframes, in order to bind them afterwards.

The sample data provided by the OP suggest that it's only one item, Title, that should be added, presumably for later grouping. If this is the case, there is a simple solution at hand using rindlist() from the data.table package which "names" the rows while binding:

# remove one list level to get a list of data.frames
# (as already suggested by the OP)
Data1 <- unlist(Data, recursive = FALSE)
# name the list elements
Data1 <- setNames(Data1, Title)
str(Data1)
List of 4
 $ Bruderholz-Initiative           :'data.frame': 91 obs. of  9 variables:
  ..$ Bezirk   : chr [1:91] "Bezirk Arlesheim" "Aesch" "Allschwil" "Arlesheim" ...
  ..$ Resultat : chr [1:91] "abgelehnt11680" "abgelehnt" "abgelehnt" "abgelehnt" ...
  ..$ Ja       : chr [1:91] "15433" "840" "1473" "727" ...
  ..$ Nein     : chr [1:91] "27159" "1606" "3513" "1982" ...
  ..$ Leer     : chr [1:91] "864" "38" "121" "75" ...
  ..$ Ungültig: chr [1:91] "758" "18" "179" "59" ...
  ..$ Ja%      : chr [1:91] "36.23" "34.34" "29.54" "26.84" ...
  ..$ Nein%    : chr [1:91] "63.77" "65.66" "70.46" "73.16" ...
  ..$ Gemeldet : chr [1:91] "15 von 15" "ja" "ja" "ja" ...
 $ Lehrpersonen-Initiative         :'data.frame': 91 obs. of  9 variables:
[...]
 $ Abschaffung Amtszeitbeschränkung:'data.frame': 91 obs. of  9 variables:
[...]
 $ Aufgabenzuordnung BL-Gemeinden  :'data.frame': 91 obs. of  9 variables:
[...]
library(data.table)
# combine all rows, thereby creating an id column Title containing 
# the names of the list elements 
DT <- rbindlist(Data1, idcol = "Title")
DT 
                              Title           Bezirk       Resultat    Ja  Nein Leer Ungültig   Ja% Nein%  Gemeldet
  1:          Bruderholz-Initiative Bezirk Arlesheim abgelehnt11680 15433 27159  864       758 36.23 63.77 15 von 15
  2:          Bruderholz-Initiative            Aesch      abgelehnt   840  1606   38        18 34.34 65.66        ja
  3:          Bruderholz-Initiative        Allschwil      abgelehnt  1473  3513  121       179 29.54 70.46        ja
  4:          Bruderholz-Initiative        Arlesheim      abgelehnt   727  1982   75        59 26.84 73.16        ja
  5:          Bruderholz-Initiative      Biel-Benken      abgelehnt   565   575   23        20 49.56 50.44        ja
 ---                                                                                                                
360: Aufgabenzuordnung BL-Gemeinden       Niederdorf     angenommen   298    85   15         4 77.81 22.19        ja
361: Aufgabenzuordnung BL-Gemeinden         Oberdorf     angenommen   416   119   27         4 77.76 22.24        ja
362: Aufgabenzuordnung BL-Gemeinden      Reigoldswil     angenommen   333    65   23         7 83.67 16.33        ja
363: Aufgabenzuordnung BL-Gemeinden        Titterten     angenommen   122    28    9         4 81.33 18.67        ja
364: Aufgabenzuordnung BL-Gemeinden       Waldenburg     angenommen   158    45   23         4 77.83 22.17        ja

For the sake of completeness, there are also other ways to add an id column to the single data rows before binding:

In the original, nested list:

Data0 <- lapply(seq_along(Data), function(.i) cbind(Data[[.i]][[1]], Title = Title[[.i]]))
str(Data0[1])
List of 1
 $ :'data.frame': 91 obs. of  10 variables:
  ..$ Bezirk   : chr [1:91] "Bezirk Arlesheim" "Aesch" "Allschwil" "Arlesheim" ...
  ..$ Resultat : chr [1:91] "abgelehnt11680" "abgelehnt" "abgelehnt" "abgelehnt" ...
  ..$ Ja       : chr [1:91] "15433" "840" "1473" "727" ...
  ..$ Nein     : chr [1:91] "27159" "1606" "3513" "1982" ...
  ..$ Leer     : chr [1:91] "864" "38" "121" "75" ...
  ..$ Ungültig: chr [1:91] "758" "18" "179" "59" ...
  ..$ Ja%      : chr [1:91] "36.23" "34.34" "29.54" "26.84" ...
  ..$ Nein%    : chr [1:91] "63.77" "65.66" "70.46" "73.16" ...
  ..$ Gemeldet : chr [1:91] "15 von 15" "ja" "ja" "ja" ...
  ..$ Title    : Factor w/ 1 level "Bruderholz-Initiative": 1 1 1 1 1 1 1 1 1 1 ...

or in the "flattened" list:

Data1 <- unlist(Data, recursive = FALSE)
Data2 <- lapply(seq_along(Data1), function(.i) cbind(Data1[[.i]], Title = Title[[.i]]))
str(Data2[1])
List of 1
 $ :'data.frame': 91 obs. of  10 variables:
  ..$ Bezirk   : chr [1:91] "Bezirk Arlesheim" "Aesch" "Allschwil" "Arlesheim" ...
  ..$ Resultat : chr [1:91] "abgelehnt11680" "abgelehnt" "abgelehnt" "abgelehnt" ...
  ..$ Ja       : chr [1:91] "15433" "840" "1473" "727" ...
  ..$ Nein     : chr [1:91] "27159" "1606" "3513" "1982" ...
  ..$ Leer     : chr [1:91] "864" "38" "121" "75" ...
  ..$ Ungültig: chr [1:91] "758" "18" "179" "59" ...
  ..$ Ja%      : chr [1:91] "36.23" "34.34" "29.54" "26.84" ...
  ..$ Nein%    : chr [1:91] "63.77" "65.66" "70.46" "73.16" ...
  ..$ Gemeldet : chr [1:91] "15 von 15" "ja" "ja" "ja" ...
  ..$ Title    : Factor w/ 1 level "Bruderholz-Initiative": 1 1 1 1 1 1 1 1 1 1 ...

No kind of for loops is required to accomplish the task.

Please, note that cbind() has turned Title to factor by default. This can be turned off by including the parameter stringsAsFactors = FALSE in the call to cbind().

Both approaches return a list of data.frames which can be directly combined row-wise by

do.call(rbind, Data0)

or

rbindlist(Data0)

Upvotes: 1

Related Questions