Creating dataframe in R loop and naming it

I am working with 5 data frames that I want to filter (eliminating some rows if they match a regex). Because all data frames are similar, with the same variable names, I stored them in a list and I'm iterating it. However, when I want to save the filtered data for each of the original data frame, I find that it creates an i_filtered (instead of dfName_filtered) so every time the loop runs, it gets overwritten. Here's what I have in the loop:

for (i in list_all){
  i_filtered1 <- i[i$chr != filter1,]
  i_filtered2 <- i[i$chr != filter2,]
  #Write the result filtered table in a csv file
  #Change output directory if needed
  write.csv(i_filtered2, file="/home/tama/Desktop/i_filtered.csv")
}

As I said, filter1 and filter2 are just regex that I'm using to filter the data in the chr column. What's the correct way to assign the original name + "_filtered" to the new dataframe?

Thanks in advance

Edited to add info: Each dataframe has these variables (but values can change)

chr     start   end    length
chr1    10400   10669   270
chr10   237646  237836  191
chrX    713884  714414  531
chrUn   713884  714414  531
chr1    762664  763174  511
chr4    805008  805571  564

And I have stored all them in a list:

list_all <- list(heep, oe, st20_n, st20_t,all)
list_all <- lapply(list_all, na.omit)

The filters:

#Get rid of random chromosomes
filter1=".*random"
#Get rid of undefined chromosomes
filter2 = "ĉhrUn.*

The output I'm looking for is:

heep_filtered1
heep_filtered2
oe_filtered1
oe_filtered2
etc

Upvotes: 1

Views: 3904

Answers (2)

Ernest A
Ernest A

Reputation: 7839

One possibility is to iterate over a sequence of indices (or names), rather than over the list of data-frames itself, and access the data-frames using the indices.

Another problem is that the != operator doesn't support regular expressions. It only does exact literal matches. You need to use grepl() instead.

names(list_all) <- c("heep", "oe", "st20_n", "st20_t", "all")

filtered <- NULL
for (i in names(list_all)){
    df <- list_all[[i]]
    df.1 <- df[!grepl(filter1, df$chr), ]
    df.2 <- df[!grepl(filter2, df$chr), ]
    #Write the result filtered table in a csv file
    #Change output directory if needed
    write.csv(df.2, file=paste0("/home/tama/Desktop/", i, "_filtered.csv"))
    filtered[[paste0(i, "_filtered", 1)]] <- df.1
    filtered[[paste0(i, "_filtered", 2)]] <- df.2
}

The result is a list called filtered that contains the filtered data-frames.

Upvotes: 2

Mark Peterson
Mark Peterson

Reputation: 9570

The issue is that i is only interpreted specially when it is alone. You are using it as part of other names, and as a character in the current version.

I would suggest naming the list, then using lapply instead of a for loop (note that I also changed the filter to occur in one step, since right now it is unclear if you are trying to take both things out or not -- this also makes it easier to add more filters).

filters <- c(".*random", "chrUn.*")
list_all <- list(heep = heep
                 , oe = oe
                 , st20_n = st20_n
                 , st20_t = st20_t
                 , all = all)
toLoop <- names(list_all)
names(toLoop) <- toLoop # renames them in the output list


filtered <- lapply(toLoop, function(thisSet)){
  tempFiltered <- list_all[[thisSet]][!(list_all[[thisSet]]$chr %in% filters),]
  #Write the result filtered table in a csv file
  #Change output directory if needed
  write.csv(tempFiltered, file=paste0("/home/tama/Desktop/",thisSet,"_filtered.csv"))

  # Return the part you care about
  return(tempFiltered)
}

Upvotes: 1

Related Questions