Creating dataframe in R loop and naming it

Question

I am working with 5 data frames that I want to filter (eliminating some rows if they match a regex). Because all data frames are similar, with the same variable names, I stored them in a list and I'm iterating it. However, when I want to save the filtered data for each of the original data frame, I find that it creates an i_filtered (instead of dfName_filtered) so every time the loop runs, it gets overwritten. Here's what I have in the loop:

for (i in list_all){
  i_filtered1 <- i[i$chr != filter1,]
  i_filtered2 <- i[i$chr != filter2,]
  #Write the result filtered table in a csv file
  #Change output directory if needed
  write.csv(i_filtered2, file="/home/tama/Desktop/i_filtered.csv")
}

As I said, filter1 and filter2 are just regex that I'm using to filter the data in the chr column. What's the correct way to assign the original name + "_filtered" to the new dataframe?

Thanks in advance

Edited to add info: Each dataframe has these variables (but values can change)

chr     start   end    length
chr1    10400   10669   270
chr10   237646  237836  191
chrX    713884  714414  531
chrUn   713884  714414  531
chr1    762664  763174  511
chr4    805008  805571  564

And I have stored all them in a list:

list_all <- list(heep, oe, st20_n, st20_t,all)
list_all <- lapply(list_all, na.omit)

The filters:

#Get rid of random chromosomes
filter1=".*random"
#Get rid of undefined chromosomes
filter2 = "ĉhrUn.*

The output I'm looking for is:

heep_filtered1
heep_filtered2
oe_filtered1
oe_filtered2
etc

Ernest A · Accepted Answer

One possibility is to iterate over a sequence of indices (or names), rather than over the list of data-frames itself, and access the data-frames using the indices.

Another problem is that the != operator doesn't support regular expressions. It only does exact literal matches. You need to use grepl() instead.

names(list_all) <- c("heep", "oe", "st20_n", "st20_t", "all")

filtered <- NULL
for (i in names(list_all)){
    df <- list_all[[i]]
    df.1 <- df[!grepl(filter1, df$chr), ]
    df.2 <- df[!grepl(filter2, df$chr), ]
    #Write the result filtered table in a csv file
    #Change output directory if needed
    write.csv(df.2, file=paste0("/home/tama/Desktop/", i, "_filtered.csv"))
    filtered[[paste0(i, "_filtered", 1)]] <- df.1
    filtered[[paste0(i, "_filtered", 2)]] <- df.2
}

The result is a list called filtered that contains the filtered data-frames.

Creating dataframe in R loop and naming it

Answers (2)

Related Questions