DataDancer
DataDancer

Reputation: 175

How to Make use of a list of tables

Using the XML package I was able to scrape over 80 tables from a website, and this number will grow over time as well. Tables them selves are not very large mostly 6x10 (this size varies between tables and over time too). The redeeming fact is that 99% of the time the tables will have the same columns i.e. column names. for example:

 table[1]
 A B C D E F
 1 b b 2 2 b
 2 b b 2 2 b 


 table[2]
 A B C D E F
 1 c c 2 2 c
 2 c c 2 2 c 

how would i go about combining all the tables and their observations into separate variables (each column =variable) while making sure that the observations within each variable maintain their link to the original table (e.g. though an additional variable).

As the different tables refer to the results of different rounds in a competition the end result that i would like to achieve is to be able to track an individuals progression through the competition and for that matter throughout different competitions in any one year (i expect to be scraping a lot of tables).

Any nice R code that anyone can pass on would be great and ideas of best practice for making use of and/or analyzing this mass of information would be invaluable.

Upvotes: 2

Views: 117

Answers (2)

agstudy
agstudy

Reputation: 121568

I haven't see @flodel solution, before posting but it is the same idea using the base package.

dat1 <- read.table(text = '
A B C D E F
1 b b 2 2 b
2 b b 2 2 b',header=T)

dat2 <- read.table(text ='
A B C D E F
1 c c 2 2 c
2 c c 2 2 c',header=T)

On idea is to put all your data.frames in a list , and treat them.

ll <- list(dat1,dat2)   ## I assume your table in a list 
ll <- lapply(seq_along(ll),function(i)cbind(ll[[i]],id = i))
do.call(rbind,ll)

  A B C D E F id
1 1 b b 2 2 b  1
2 2 b b 2 2 b  1
3 1 c c 2 2 c  2
4 2 c c 2 2 c  2

I think you don't need to put all in a big data.frame, you can treat them in the list. For example :

ll <- lapply(ll,function(dat){
  sum(rank(dat))  ## dummy rank function 
})

You get a list of score for each round

 ll
[[1]]
[1] 105

[[2]]
[1] 105

Upvotes: 2

flodel
flodel

Reputation: 89057

Two things:

1) add an ID column to each of your tables:

tables <- lapply(seq_along(tables), function(i) transform(tables[[i]], ID = i))

2) to bind/align columns that may not have all the same columns, use plyr::rbind.fill:

library(plyr)
all.data <- do.call(rbind.fill, tables)

What you get out is a single data.frame holding all your data. To create "separate variables" like you asked, you could then use attach(all.data) but it is really not recommended. You are better off keeping the data in a data.frame for your analysis.

Upvotes: 1

Related Questions