Reputation: 365
I have two dataframes that both contain a column entitled "new_id" There are 100's, sometimes even 1000's of rows for each unique value of "new_id" in both data frames
I want to merge the two data frames based on this column but if I take a brute force approach using:
df3 <- merge(df1, df2, by = "new_id")
I run into memory issues.
I created a subset of df2 consisting of just one new_id value and merged it with df1 with success. So, I think I've come up with a solution using a for loop. However, I am not sure how to construct it since I am at a beginner level with R.
What I think the loop should do is first subset df2 for each unique value of new_id and then merge that subsetted data frame with df1, storing the result in a list. With each iteration of the loop, the list would append each new dataframes for each new_id. At the end, I would rbind the list into its own "master" dataframe.
I don't know if this is correct but I've come up with this loop as the basic structure and potential a starting point:
datalist = list()
for(i in unique(Procs_OP$new_id)){
df <- subset(Procs_OP, new_id %in% Procs_OP$new_id))
df2 <- merge(step2, df, by="new_id", all.step2=TRUE)
datalist[[i]] <- df2
}
df3 = do.call(rbind, datalist)
Upvotes: 1
Views: 573
Reputation: 886948
The subset
should be on i
as the i
here is unique
element looped
un1 <- unique(Procs_OP$new_id)
datalist <- vector('list', length(un1))
names(datalist) <-
for(i in un1){
df <- subset(Procs_OP, new_id %in% i))
df2 <- merge(step2, df, by="new_id", all.step2=TRUE)
datalist[[i]] <- df2
}
Upvotes: 1