Reputation: 65
i have some data in the following format, at the moment this is in a Pandas Dataframe.
Row Uid Lender
1 1 HSBC
2 1 Lloyds
3 1 Barclays
4 2 Lloyds
5 2 Barclays
6 2 Santander
7 2 RBS
8 2 HSBC
What i require is all of the possible combinations of the Lenders columns for each Uid so the output would be something like this
Row Uid LenderCombo
1 1 Barclays
2 1 Lloyds
3 1 HSBC
4 1 Barclays, HSBC
5 1 Barclays, Lloyds
6 1 HSBC, Lloyds
7 1 Barclays, HSBC, Lloyds
And the same for Uid 2 and so on, apologies if this has been answered before i'm just unsure of how to approach this.
Thanks,
Upvotes: 2
Views: 111
Reputation: 863291
Use GroupBy.apply
with custom function and join tuples by join
:
from itertools import chain, combinations
#https://stackoverflow.com/a/5898031
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(1, len(ss)+1)))
df = (df.groupby('Uid')['Lender']
.apply(lambda x: pd.Series([', '.join(y) for y in all_subsets(x)]))
.reset_index()
.rename(columns={'level_1':'Row'}))
print (df)
Uid Row Lender
0 1 0 HSBC
1 1 1 Lloyds
2 1 2 Barclays
3 1 3 HSBC, Lloyds
4 1 4 HSBC, Barclays
5 1 5 Lloyds, Barclays
6 1 6 HSBC, Lloyds, Barclays
7 2 0 Lloyds
8 2 1 Barclays
9 2 2 Santander
10 2 3 RBS
11 2 4 HSBC
12 2 5 Lloyds, Barclays
13 2 6 Lloyds, Santander
14 2 7 Lloyds, RBS
15 2 8 Lloyds, HSBC
16 2 9 Barclays, Santander
17 2 10 Barclays, RBS
18 2 11 Barclays, HSBC
19 2 12 Santander, RBS
20 2 13 Santander, HSBC
21 2 14 RBS, HSBC
22 2 15 Lloyds, Barclays, Santander
23 2 16 Lloyds, Barclays, RBS
24 2 17 Lloyds, Barclays, HSBC
25 2 18 Lloyds, Santander, RBS
26 2 19 Lloyds, Santander, HSBC
27 2 20 Lloyds, RBS, HSBC
28 2 21 Barclays, Santander, RBS
29 2 22 Barclays, Santander, HSBC
30 2 23 Barclays, RBS, HSBC
31 2 24 Santander, RBS, HSBC
32 2 25 Lloyds, Barclays, Santander, RBS
33 2 26 Lloyds, Barclays, Santander, HSBC
34 2 27 Lloyds, Barclays, RBS, HSBC
35 2 28 Lloyds, Santander, RBS, HSBC
36 2 29 Barclays, Santander, RBS, HSBC
37 2 30 Lloyds, Barclays, Santander, RBS, HSBC
Upvotes: 5