Tanai Goncalves
Tanai Goncalves

Reputation: 69

Join Lists of Dataframe by key column

I have two dataframes in python (here only the df.head(3)):

df1

Column1 Column2 Column3 Column4 Column5
1a 123 RJ 12.1 test1
2a 432 MT 23.2 test3
3a 234 DR 34.3 test5

df2

Column1 Column3 Column6
1a RJ 695
2a MT 568
3a DR 232

And I have created a list grouping each dataframe by Column1 having one table(lots of rows) for each value of Column1:

list_df1 =[]
grouped = df1.groupby('Column1')

for name, group in grouped:
    if name == 0:
        group = group.merge(df1,how='left')
    elif name == 1:
        group = group.merge(df1,how='left')
    list_df1.append(group)
pd.concat(list_df1)

Made the same for df2:

list_df2 =[]
grouped = df2.groupby('Column1')

for name, group in grouped:
    if name == 0:
        group = group.merge(df2,how='left')
    elif name == 1:
        group = group.merge(df2,how='left')
    list_df2.append(group)
pd.concat(list_df2)

And what I need is to join these two lists by Column1

The result of this join should be something like this:

          final_list = ([list_df1.Column1(1a) , list_df2.Column1(1a)] ,

                       [list_df1.Column1(2a) , list_df2.Column1(2a)] , 

                       [list_df1.Column1(3a) , list_df2.Column1(3a)] )

The end of this is that for each value of Column1 I'll have one CSV file (separated by pipe "|") that appends the grouped dataframes list_df1 and list_df2.

CSV 1a

 1a|123|RJ|12.1|test1

 1a|XXX|YY|000|testx

...

 1a|RJ|695

 1a|XX|000

...

CSV 2a

 2a|432|MT|23.2|test3

 2a|XXX|YY|000|testx

....

 2a|MT|568

 2a|XX|000

...

CSV 3a

 3a|234|DR|34.3|test5

 3a|XXX|YY|000|testx

...

 3a|DR|232

 3a|XX|000

...

Any Ideas? Thanks in advance.

Upvotes: 0

Views: 658

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

For DF1, I see you used a variant of this answer, however, you're currently merging all of your groups with the same DataFrame, so you don't need the additional selection structure to conditionally merge.

For DF2, you can just filter your DF2. Merging it with itself is redundant, in this case. So you can just select the values in df2 that you want to get be groups in Column 1 from df1.

You can use comprehension to build your final_lst out of your groupbys of df1. Merge each group with df2, and let the other entry be just the rows where the value in column1 matches the name of the group.

Something like:

import pandas as pd

df1 = pd.DataFrame({'Column1': {0: '1a', 1: '2a', 2: '3a'},
                    'Column2': {0: 123, 1: 432, 2: 234},
                    'Column3': {0: 'RJ', 1: 'MT', 2: 'DR'},
                    'Column4': {0: 12.1, 1: 23.2, 2: 34.3},
                    'Column5': {0: 'test1', 1: 'test3', 2: 'test5'}})
df2 = pd.DataFrame({'Column1': {0: '1a', 1: '2a', 2: '3a'},
                    'Column3': {0: 'RJ', 1: 'MT', 2: 'DR'},
                    'Column6': {0: 695, 1: 568, 2: 232}})

final_list = ([group.merge(df2, how='left'),
               df2[df2['Column1'] == name].reset_index(drop=True)]
              for name, group in df1.groupby('Column1'))

# For Display
for merged, filtered_df2 in final_list:
    print(f"csv {merged['Column1'].iloc[0]}")
    print(merged.to_string(index=False, header=False))
    print('-----')
    print(filtered_df2.to_string(index=False, header=False))
    print()

Output:

csv 1a
1a 123 RJ 12.1 test1 695
-----
1a RJ 695

csv 2a
2a 432 MT 23.2 test3 568
-----
2a MT 568

csv 3a
3a 234 DR 34.3 test5 232
-----
3a DR 232

See:

  1. Convert a pandas groupby object to list of dataframes
  2. How to select rows from a DataFrame based on column values

Upvotes: 1

Related Questions