Reputation: 1145
I have over 100 dataframes (df1, df2, df3, ....
) each contains the same variables. I want to loop through all of them and remove duplicates by id
. For df1
, I can do:
df1 <- df1[!duplicated(df1$id), ]
How can I do this in an efficient way?
Upvotes: 0
Views: 997
Reputation: 160617
If you're dealing with 100 similarly-structured data.frames, I suggest instead of naming them uniquely, you put them in a list.
Assuming they are all named df
and a number, then you can easily assign them to a list with something like:
df_varnames <- ls()[ grep("^df[0-9]+$", ls()) ]
or, as @MatteoCastagna suggested in a comment:
df_varnames <- ls(pattern = "^df[0-9]+$")
(which is both faster and cleaner). Then:
dflist <- sapply(df_varnames, get, simplify = FALSE)
And from here, your question is simply:
dflist2 <- lapply(dflist, function(z) z[!duplicated(z$id),])
If you must deal with them as individual data.frames (again, discouraged, almost always slows down processing while not adding any functionality), you can try a hack like this (using df_varnames
from above):
for (dfname in df_varnames) {
df <- get(dfname)
assign(dfname, df[! duplicated(df$id), ])
}
I cringe when I consider using this, but I admit I may not understand your workflow.
Upvotes: 4