atamalu
atamalu

Reputation: 63

R: Conditionally adding list items to previous ones

I have been trying to turn pdf files into data frames using R. I start out by reading the text into R and using data.table to split the data into a list item per page. I am now having trouble writing a loop to combine the questions with their respective continued items. The txt.list object in the below code is a brief example of the format.

### Short list
txt.list <- list('Q1', 'Q2', 'continued page', 
                 'Q3', 'continued page', 'continued page', 
                 'Q4', 'Q5', 'continued page', 'continued page', 
                 'Q6', 'continued page', 'Q7', 'continued page', 'continued page')

### Label pages that continue from the previous
is.continuation <- lapply(txt.list, function(x){ startsWith(x, 'continued')}) # find which pages are continuations
is.continuation <- c(unlist(is.continuation)) # unlist for list item naming

names(txt.list) <- as.character(is.continuation)

print(txt.list)

This result is that each page in the list that is a continuation of the corresponding question is given a "TRUE" character label (I know this can be done without list labeling, I'm just trying avoid referring to an external vector).

Since each pdf file from this website almost always uses the same format, I am trying to make this work (at least somewhat) for future uses. I've been trying something along the lines of:

new.list <- vector(mode = 'list', 
                   length = length(which(names(txt.list) == 'TRUE')))

for(i in 1:length(txt.list)){
  j = i + 1 # pg ahead

  if(names(txt.list)[[j]] == "TRUE"){

    new.list[[i]][[1]] <- txt.list[[i]]
    m = 2 # index ahead

    while(names(txt.list)[[j]] == "TRUE"){
      new.list[[i]][[m]] <- txt.list[[j]]
      m = m + 1
    }
  } else {
      new.list[[i]] <- txt.list[[i]]
  }
}

After a few tries, I'm just completely drawing blanks. Any help would be much appreciated!

Upvotes: 0

Views: 160

Answers (1)

pwilcox
pwilcox

Reputation: 5763

It's been awhile since I've really worked in r, but am I misreading your for loop? Don't you need for (i in 1:length(...))? If you don't have the 1: part, then there's no range, and so you won't do any looping.

Your main issue outside of that is that you're pumping your newlist in at the 'i' location, when that variable is only appropriate for reading from txt.list. You should keep a separate tracker for new.list (such as nlSize), and tick it up whenever it's appropriate.

Another minor issue is that you have an anchor before your while loop that you can avoid.

Finally, I would definitely get away from setting the names as truth values. It would have been better to reference an external vector, though you don't have to do that either. Just make a function and use it inside your loop.

I put my code in a function called normalizeList and then call it on txt.list. This way you can use it on other similar lists.

normalizeList <- function (lst) {

    is.continuation <- function (x) 
        startsWith(x, 'continued');

    new.list <- list()
    nlSize <- 0

    for(i in 1:length(lst)) {    

      isLast <- length(lst) == i
      cur <- lst[[i]]
      nxt <- ifelse(isLast, '', lst[[i+1]]);

      if(is.continuation(cur)){
        new.list[[nlSize]] <- c(new.list[[nlSize]], cur)
        next
      } 

      nlSize <- nlSize + 1
      new.list[nlSize] <- ifelse(is.continuation(nxt), list(cur), cur)

    }

    new.list

}

normalizeList(txt.list);

Upvotes: 1

Related Questions