Reputation: 113
My dataframe contains three array columns having same length
| col_1 | col_2 | col_3 |
|-------------------------|-------------------------|------------------------------|
|['a', 'b', 'c', 'd', 'e']|['n', 'n', 'n', 'd', 'd']|['cl', 'cl', 'cl', 'en', 'en']|
|['d', 'e', 'f', 'g', 'h']|['d', 'd', 'd', 'n', 'n']|['de', 'de', 'de', 'me', 'me']|
|['a', 'd', 'e', 'b', 'c']|['p', 'p', 'c', 'c', 'c']|['it', 'it', 'cs', 'cs', 'cs']|
I want to join the strings in col_1 using group by in col_3. The corresponding string should be in col_2. My output dataframe should like this.
| col_1 | col_2 | col_3 |
|-------------|----------|------------|
|['abc', 'de']|['n', 'd']|['cl', 'en']|
|['def', 'gh']|['d', 'n']|['de', 'me']|
|['ad', 'ebc']|['p', 'c']|['it', 'cs']|
Final dataframe columns should have same length
Upvotes: 1
Views: 80
Reputation: 261860
You can explode
and use a double groupby.agg
:
out = (df
.explode(list(df)) # explode all columns, use a specific list if needed
.reset_index()
.groupby(['index', 'col_3'])
.agg(''.join)
.reset_index('col_3')
.groupby(level=0).agg(list)
[df.columns]
)
output:
col_1 col_2 col_3
index
0 [abc, de] [nnn, dd] [cl, en]
1 [def, gh] [ddd, nn] [de, me]
2 [ebc, ad] [ccc, pp] [cs, it]
Or maybe you want to group by both col_2/col_3:
out = (df
.explode(list(df))
.reset_index()
.groupby(['index', 'col_2', 'col_3'])
.agg(''.join)
.reset_index(['col_2', 'col_3'])
.groupby(level=0).agg(list)
[df.columns]
)
output:
col_1 col_2 col_3
index
0 [de, abc] [d, n] [en, cl]
1 [def, gh] [d, n] [de, me]
2 [ebc, ad] [c, p] [cs, it]
Used input:
df = pd.DataFrame({'col_1': [['a', 'b', 'c', 'd', 'e'],
['d', 'e', 'f', 'g', 'h'],
['a', 'd', 'e', 'b', 'c']],
'col_2': [['n', 'n', 'n', 'd', 'd'],
['d', 'd', 'd', 'n', 'n'],
['p', 'p', 'c', 'c', 'c']],
'col_3': [['cl', 'cl', 'cl', 'en', 'en'],
['de', 'de', 'de', 'me', 'me'],
['it', 'it', 'cs', 'cs', 'cs']]})
Upvotes: 1