Reputation: 178
I am working with 5 data frames that I want to filter (eliminating some rows if they match a regex). Because all data frames are similar, with the same variable names, I stored them in a list and I'm iterating it. However, when I want to save the filtered data for each of the original data frame, I find that it creates an i_filtered (instead of dfName_filtered) so every time the loop runs, it gets overwritten. Here's what I have in the loop:
for (i in list_all){
i_filtered1 <- i[i$chr != filter1,]
i_filtered2 <- i[i$chr != filter2,]
#Write the result filtered table in a csv file
#Change output directory if needed
write.csv(i_filtered2, file="/home/tama/Desktop/i_filtered.csv")
}
As I said, filter1 and filter2 are just regex that I'm using to filter the data in the chr column. What's the correct way to assign the original name + "_filtered" to the new dataframe?
Thanks in advance
Edited to add info: Each dataframe has these variables (but values can change)
chr start end length
chr1 10400 10669 270
chr10 237646 237836 191
chrX 713884 714414 531
chrUn 713884 714414 531
chr1 762664 763174 511
chr4 805008 805571 564
And I have stored all them in a list:
list_all <- list(heep, oe, st20_n, st20_t,all)
list_all <- lapply(list_all, na.omit)
The filters:
#Get rid of random chromosomes
filter1=".*random"
#Get rid of undefined chromosomes
filter2 = "ĉhrUn.*
The output I'm looking for is:
heep_filtered1
heep_filtered2
oe_filtered1
oe_filtered2
etc
Upvotes: 1
Views: 3904
Reputation: 7839
One possibility is to iterate over a sequence of indices (or names), rather than over the list of data-frames itself, and access the data-frames using the indices.
Another problem is that the !=
operator doesn't support regular expressions. It only does exact literal matches. You need to use grepl()
instead.
names(list_all) <- c("heep", "oe", "st20_n", "st20_t", "all")
filtered <- NULL
for (i in names(list_all)){
df <- list_all[[i]]
df.1 <- df[!grepl(filter1, df$chr), ]
df.2 <- df[!grepl(filter2, df$chr), ]
#Write the result filtered table in a csv file
#Change output directory if needed
write.csv(df.2, file=paste0("/home/tama/Desktop/", i, "_filtered.csv"))
filtered[[paste0(i, "_filtered", 1)]] <- df.1
filtered[[paste0(i, "_filtered", 2)]] <- df.2
}
The result is a list called filtered
that contains the filtered data-frames.
Upvotes: 2
Reputation: 9570
The issue is that i
is only interpreted specially when it is alone. You are using it as part of other names, and as a character in the current version.
I would suggest naming the list, then using lapply
instead of a for loop (note that I also changed the filter to occur in one step, since right now it is unclear if you are trying to take both things out or not -- this also makes it easier to add more filters).
filters <- c(".*random", "chrUn.*")
list_all <- list(heep = heep
, oe = oe
, st20_n = st20_n
, st20_t = st20_t
, all = all)
toLoop <- names(list_all)
names(toLoop) <- toLoop # renames them in the output list
filtered <- lapply(toLoop, function(thisSet)){
tempFiltered <- list_all[[thisSet]][!(list_all[[thisSet]]$chr %in% filters),]
#Write the result filtered table in a csv file
#Change output directory if needed
write.csv(tempFiltered, file=paste0("/home/tama/Desktop/",thisSet,"_filtered.csv"))
# Return the part you care about
return(tempFiltered)
}
Upvotes: 1