Reputation: 33
I am trying to merge two weelly DateFrames, which are made-up of one column each, but with different lengths.
Could I please know how to merge them, maintaining the 'Week' indexing?
[df1]
Week Coeff1
1 -0.456662
1 -0.533774
1 -0.432871
1 -0.144993
1 -0.553376
... ...
53 -0.501221
53 -0.025225
53 1.529864
53 0.044380
53 -0.501221
[16713 rows x 1 columns]
[df2]
Week Coeff
1 0.571707
1 0.086152
1 0.824832
1 -0.037042
1 1.167451
... ...
53 -0.379374
53 1.076622
53 -0.547435
53 -0.638206
53 0.067848
[63265 rows x 1 columns]
I've tried this code:
df3 = pd.merge(df1, df2, how='inner', on='Week')
df3 = df3.drop_duplicates()
df3
But it gave me a new df (df3) with 13386431 rows × 2 columns
Desired outcome: A new df which has 3 columns (week, coeff1, coeff2), as df2 is longer, I expect to have some NaNs in coeff1 to fill the gaps.
Upvotes: 2
Views: 3786
Reputation: 458
I assume your output should look somewhat like this:
Week | Coeff1 | Coeff2 |
---|---|---|
1 | -0.456662 | 0.571707 |
1 | -0.533774 | 0.086152 |
1 | -0.432871 | 0.824832 |
2 | 3 | 3 |
2 | NaN | 3 |
Don't mind the actual numbers though. The problem is you won't achieve that with a join on Week, neither left nor inner and that is due to the fact that the Week-Index is not unique. So, on a left join, pandas is going to join all the Coeff2-Values where df2.Week == 1 on every single row in df1 where df1.Week == 1. And that is why you get these millions of rows.
I will try and give you a workaround later, but maybe this helps you to think about this problem from another perspective!
Now is later:
What you actually want to do is to concatenate the Dataframes "per week". You achieve that by iterating over every week, creating a df_subset[week] concatenating df1[week] and df2[week] by axis=1 and then concatenating all these subsets on axis=0 afterwards:
weekly_dfs=[]
for week in df1.Week.unique():
sub_df1 = df1.loc[df1.Week == week, "Coeff1"].reset_index(drop=True)
sub_df2 = df2.loc[df2.Week == week, "Coeff2"].reset_index(drop=True)
concat_df = pd.concat([sub_df1, sub_df2], axis=1)
concat_df["Week"] = week
weekly_dfs.append(concat_df)
df3 = pd.concat(weekly_dfs).reset_index(drop=True)
The last reset of the index is optional but I recommend it anyways!
Upvotes: 1
Reputation: 1190
According to pandas' merge documentation, you can use merge in a way like that:
What you are looking for is a left join. However, the default option is an inner join. You can change this by passing a different how argument:
df2.merge(df1,how='left', left_on='Week', right_on='Week')
note that this would keep these rows in the bigger df and assign NaN to them when merging with the shorter df.
Upvotes: 0
Reputation: 2642
Based on your last comment on the question, you may want to concatenate instead of merging the two data frames:
df3 = pd.concat([df1,df2], ignore_index=True, axis=1)
The resulting DataFrame
should have 63265
rows and will need some work to get it to the required format (remove the added index columns, rename the remaining columns, etc.), but pd.concat
should be a good start.
Upvotes: 0