Mohit
Mohit

Reputation: 23

Using a for loop in R to loop through the name of dataframes

I have data on mergers for 20 years for various firms. I have used a "for" loop in R to separate data for each year which gives me 20 data frames in the global environment. Each data frame is identified by its year: Merger2000 to Merger2019 for 20 years. Now I want to write another for loop to find the unique companies in each data frame (that is, unique firms in each year). Each company is identified by a unique company code (co_code). I know how to do this for each year separately. For example, for the year 2000, I would do something like:

uniquemerger2000 <- Merger2000 %>% distinct(co_code, .keep_all = TRUE)

How do I run a for loop to enable this operation for all years (that is from 2000-2019)? There is some indexing required in the code but I am not sure how to operationalise this in a loop.

Any help would be appreciated. Thanks!

Upvotes: 0

Views: 39

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389265

Usually it is better to keep data in one dataframe or a list instead of multiple such objects in global environment.

You can create one list object (list_data) bringing all the dataframes together and use lapply/map to keep unique rows from each dataframe.

library(dplyr)
library(purrr)

list_data <- mget(paste0('Merger', 2000:2019))
result <- map(list_data, ~.x %>% distinct(co_code, .keep_all = TRUE))

Or in base R :

result <- lapply(list_data, function(x) x[!duplicated(x$co_code), ])

Upvotes: 1

Related Questions