John
John

Reputation: 43189

Obtain elements from a list with a for loop

I am trying to extract values from a list using a forloop. The list contains 77 elements that I have scraped from a webpage. They were put into a list with strsplit matched to a rather messy regular expression.

chunk <- strsplit(lines, "(<tr>|</td>)(<td>|<td[^>]+>)|aws| MB| KB")

A sample element looks like:

> chunk[76]
[[1]]
 [1] ""                                                                                     
 [2] "<img src=\"/images/"                                                                  
 [3] "tats/flags/mn.png\" height=\"14\" alt='mn' title='mn' />"                             
 [4] "Mongolia"                                                                             
 [5] "mn"                                                                                   
 [6] "1"                                                                                    
 [7] "1"                                                                                    
 [8] "21.95"                                                                                
 [9] ""                                                                                     
[10] "<img src=\"/images/"                                                                  
[11] "tats/other/hp.png\" width=\"2\" height=\"5\" alt='Pages: 1' title='Pages: 1' /><br />"

I have tried to extract the parts of each elements that I need with:

for (i in length(chunk)) {  
    values <- chunk[[i]][c(4,6:8)]
}

The result returned for values is always the extracted parts of the last list element (chunk[[77]])

Can anyone suggest how to obtain the values I need for every list element.

Upvotes: 2

Views: 3797

Answers (3)

Joshua Ulrich
Joshua Ulrich

Reputation: 176638

You could use lapply with do.call(rbind, ...) instead of the for loop.

chunk <- list(
  c("", "<img src=\"/images/",
  "tats/flags/mn.png\" height=\"14\" alt='mn' title='mn' />",
  "Mongolia", "mn", "1", "1", "21.95", "", "<img src=\"/images/",
  "tats/other/hp.png\" width=\"2\" height=\"5\" alt='Pages: 1' title='Pages: 1' /><br />"),
  c("", "<img src=\"/images/",
  "tats/flags/mn.png\" height=\"14\" alt='mn' title='mn' />",
  "Mongolia", "mn", "1", "1", "21.95", "", "<img src=\"/images/",
  "tats/other/hp.png\" width=\"2\" height=\"5\" alt='Pages: 1' title='Pages: 1' /><br />") )
do.call(rbind, lapply(chunk, `[`, c(4,6:8)))
#      [,1]       [,2] [,3] [,4]   
# [1,] "Mongolia" "1"  "1"  "21.95"
# [2,] "Mongolia" "1"  "1"  "21.95"

Upvotes: 2

teucer
teucer

Reputation: 6238

You should replace values <- chunk[[i]][c(4,6:8)] with values <- rbind(values,chunk[[i]][c(4,6:8)]) (initialize values <- NULL before the loop).

Or you can create a matrix before the loop values <- matrix(0,length(chunk),4) and in the loop use values[i,] <- chunk[[i]][c(4,6:8)]. This is more efficient!

Upvotes: 4

Christian B&#248;hlke
Christian B&#248;hlke

Reputation: 779

I would advise to use Perl instead. It is much more handy in performing the operations (I assume) you would like to do.

Upvotes: -1

Related Questions