Reputation: 561
So, basically I have two dataframes one is df1
and df2
so each size is [9000000 rows X 11 columns] and [2700000 rows X 18 columns].
Following df1
looks like this,
| Index | Frame | Chunk_Ids |...|P_filter|
|:------|:-----:|:---------:|:-:|:------:|
| 1 | 1 | 20001 |...| 0.142|
| 2 | 1 | 20001 |...| 0.146|
| 3 | 1 | 20001 |...| 0.147|
| . | . | . |...| .|
| 900000| 260000 | 200020 |0.245| 0.142|
Following `df2` looks like this,
| Index | Frame | Chunk_Ids |...| P_filter|
|:------|:------:|:---------:|:-:|:-------:|
| 3 | 1 | 19999 |...| 0.142|
| 6 | 1 | 20001 |...| 0.146|
| 7 | 1 | 20001 |...| 0.147|
| . | . | . |...| .|
| 270000| 260000 | 200019 |...| 0.142|
I have two questions, how to merge two dataframes with below condition,
I tried and burnt my head out from last two days with many methods but I can't satisify both conditions. Any leads will be appreciated.
Upvotes: 0
Views: 849
Reputation: 571
I didn't test because you did not provide usable examples - but something like this should work
df_merged = df1.merge(df2, on='Chunk_Ids', suffixes=['_df1', '_df2']
df_merged['index_missing'] = df_merged[['Index_df1', 'Index_df2']].apply(lambda x: sum(pd.isnull(x))) > 0
df2_columns = [str(x) + '_df2' for x in df2.columns.tolist()]
df_merged.loc[df_merged.index_missing, df2_columns] = -1
I'm not entirely sure what you meant by not missing any data but you might prefer to use outer join .merge(how='outer')
. By default inner join is performed, meaning rows without pair will be lost.
Upvotes: 2