When merging, the data frame becomes much larger

Question

I have a problem. I have two dataframes that I would like to merge with each other. The problem is that if I merge them together I get a MemoryError because the size of the dataframe has grown many times. I have found the following article I looked at pandas merge(how="inner") result is bigger than both dataframes. I still can't merge them at all because I'm running out of memory.

Is there an option to just merge the first element and ignore the other duplicates?

For example, my dataframe df_1 is structured like this (see below) Is there an option that df_2 only writes its values in and if there are duplicates these are simply ignored?

Dataframe df_1

id_x  B   C  id_y new_column
1     4   9  1    a
2     5   8  2    b
3     6   7  3    c
3     6   7  3    z # should be ignored

df_merged= pd.merge(df_1,
                    df_2, how='inner',
                    left_on=['id_x'], right_on=['id_y'],
                    suffixes=['', '_right'])

When merging, the data frame becomes much larger

Answers (1)

Related Questions