Reputation: 69
I have two dataframes in python (here only the df.head(3)):
df1
Column1 | Column2 | Column3 | Column4 | Column5 |
---|---|---|---|---|
1a | 123 | RJ | 12.1 | test1 |
2a | 432 | MT | 23.2 | test3 |
3a | 234 | DR | 34.3 | test5 |
df2
Column1 | Column3 | Column6 |
---|---|---|
1a | RJ | 695 |
2a | MT | 568 |
3a | DR | 232 |
And I have created a list grouping each dataframe by Column1 having one table(lots of rows) for each value of Column1:
list_df1 =[]
grouped = df1.groupby('Column1')
for name, group in grouped:
if name == 0:
group = group.merge(df1,how='left')
elif name == 1:
group = group.merge(df1,how='left')
list_df1.append(group)
pd.concat(list_df1)
Made the same for df2:
list_df2 =[]
grouped = df2.groupby('Column1')
for name, group in grouped:
if name == 0:
group = group.merge(df2,how='left')
elif name == 1:
group = group.merge(df2,how='left')
list_df2.append(group)
pd.concat(list_df2)
And what I need is to join these two lists by Column1
The result of this join should be something like this:
final_list = ([list_df1.Column1(1a) , list_df2.Column1(1a)] ,
[list_df1.Column1(2a) , list_df2.Column1(2a)] ,
[list_df1.Column1(3a) , list_df2.Column1(3a)] )
The end of this is that for each value of Column1 I'll have one CSV file (separated by pipe "|") that appends the grouped dataframes list_df1 and list_df2.
CSV 1a
1a|123|RJ|12.1|test1
1a|XXX|YY|000|testx
...
1a|RJ|695
1a|XX|000
...
CSV 2a
2a|432|MT|23.2|test3
2a|XXX|YY|000|testx
....
2a|MT|568
2a|XX|000
...
CSV 3a
3a|234|DR|34.3|test5
3a|XXX|YY|000|testx
...
3a|DR|232
3a|XX|000
...
Any Ideas? Thanks in advance.
Upvotes: 0
Views: 658
Reputation: 35626
For DF1, I see you used a variant of this answer, however, you're currently merging all of your groups with the same DataFrame, so you don't need the additional selection structure to conditionally merge.
For DF2, you can just filter your DF2. Merging it with itself is redundant, in this case. So you can just select the values in df2 that you want to get be groups in Column 1 from df1.
You can use comprehension to build your final_lst out of your groupbys of df1. Merge each group with df2, and let the other entry be just the rows where the value in column1 matches the name of the group.
Something like:
import pandas as pd
df1 = pd.DataFrame({'Column1': {0: '1a', 1: '2a', 2: '3a'},
'Column2': {0: 123, 1: 432, 2: 234},
'Column3': {0: 'RJ', 1: 'MT', 2: 'DR'},
'Column4': {0: 12.1, 1: 23.2, 2: 34.3},
'Column5': {0: 'test1', 1: 'test3', 2: 'test5'}})
df2 = pd.DataFrame({'Column1': {0: '1a', 1: '2a', 2: '3a'},
'Column3': {0: 'RJ', 1: 'MT', 2: 'DR'},
'Column6': {0: 695, 1: 568, 2: 232}})
final_list = ([group.merge(df2, how='left'),
df2[df2['Column1'] == name].reset_index(drop=True)]
for name, group in df1.groupby('Column1'))
# For Display
for merged, filtered_df2 in final_list:
print(f"csv {merged['Column1'].iloc[0]}")
print(merged.to_string(index=False, header=False))
print('-----')
print(filtered_df2.to_string(index=False, header=False))
print()
Output:
csv 1a
1a 123 RJ 12.1 test1 695
-----
1a RJ 695
csv 2a
2a 432 MT 23.2 test3 568
-----
2a MT 568
csv 3a
3a 234 DR 34.3 test5 232
-----
3a DR 232
See:
Upvotes: 1