python pandas Transfer the format of the dataframe

Question

I have a dataframe named df like this: (there's no duplicate rows of df)

a_id           b_id

111111         18
111111         17
222222         18
333333         14
444444         13 
555555         18
555555         24
222222         13
222222         17
333333         17

And I want to invert it to a dataframe df_2 like this:

a_one     a_two      b_list   number_of_b  
222222    444444     13       1
111111    222222     17,18    2
111111    333333     17       1
111111    222222     17       1
222222    333333     17       1
111111    555555     18       1
222222    555555     18       1

If a_id share the same b_id, they become a pair on df_2;

the b_list of df_2 is the correspondingly b_id;

the number_of_b is the length of b_list

Allen Qin · Accepted Answer

Use a chained operation with a series of groupby and transformations:

from itertools import combinations
df2 = (
       df.groupby('b_id')['a_id']
       .apply(lambda x: list(combinations(x.values,2)))
       .apply(pd.Series).stack()
       .reset_index(0)
       .groupby(0).apply(lambda x: [len(x), x.b_id.astype(str).tolist()])
       .apply(pd.Series)
       .rename(columns={0:'b_list', 1:'number_of_b'})
       .pipe(lambda x: x.reset_index(drop=True)
             .join(x.reset_index()[0].apply(pd.Series)))
       .rename(columns={0:'a_one', 1:'a_two'})
       .assign(number_of_b=lambda x: x.number_of_b.str.join(','))
    )[['a_one','a_two','b_list','number_of_b']]



df2
Out[123]: 
    a_one   a_two  b_list number_of_b
0  111111  222222       2       17,18
1  111111  333333       1          17
2  111111  555555       1          18
3  222222  333333       1          17
4  222222  555555       1          18
5  444444  222222       1          13

python pandas Transfer the format of the dataframe

Answers (2)

Related Questions