Reputation: 452
I have a list of 4 dataframes each containing only 1 column ('CustomerID'). I would like to merge (inner join) them within a loop.
This is what I've try for the moment:
for i in all_df:
merged = all_df[0].merge(all_df[1], on='CustomerID')
del df[0]
What I'm trying to do here is to merge the first dataframe (index 0) with the second (index 1), then delete the first dataframe in order that the dataframe of index 1 becomes the dataframe of index 0 and thus, I could iterate.
I know this doesn't work as what I should merge from the second iteration should be the datframe from the new variable "merged" with the daframe of index 1.
The 4 dataframes are a client database at diferent time (march 2019, april 2019, may 2019 etc.). The point is to analyse the client lifetime (how long did they stay client?, after how many days did they left? etc.)
Could you please help me with that?
Upvotes: 2
Views: 473
Reputation: 25239
If you want to merge multiple dataframes, you may use functools.reduce
as follows
from functools import reduce
df_merge = reduce(lambda df_x, df_y: pd.merge(df_x, df_y, on='CustomerID'), all_df)
Upvotes: 2
Reputation: 6485
Following your step this should accomplish what you are trying to do:
#Initialize the final dataframe
result_df = all_df[0]
# Cycle over the list, from the second dataframe onwards
for df in all_df[1:]:
result_df = result_df.merge(df, on='CustomerID')
Upvotes: 0