kkjoe
kkjoe

Reputation: 795

python pandas Transfer the format of the dataframe

I have a dataframe named df like this: (there's no duplicate rows of df)

a_id           b_id

111111         18
111111         17
222222         18
333333         14
444444         13 
555555         18
555555         24
222222         13
222222         17
333333         17

And I want to invert it to a dataframe df_2 like this:

a_one     a_two      b_list   number_of_b  
222222    444444     13       1
111111    222222     17,18    2
111111    333333     17       1
111111    222222     17       1
222222    333333     17       1
111111    555555     18       1
222222    555555     18       1   

If a_id share the same b_id, they become a pair on df_2;

the b_list of df_2 is the correspondingly b_id;

the number_of_b is the length of b_list

Upvotes: 1

Views: 102

Answers (2)

Allen Qin
Allen Qin

Reputation: 19947

Use a chained operation with a series of groupby and transformations:

from itertools import combinations
df2 = (
       df.groupby('b_id')['a_id']
       .apply(lambda x: list(combinations(x.values,2)))
       .apply(pd.Series).stack()
       .reset_index(0)
       .groupby(0).apply(lambda x: [len(x), x.b_id.astype(str).tolist()])
       .apply(pd.Series)
       .rename(columns={0:'b_list', 1:'number_of_b'})
       .pipe(lambda x: x.reset_index(drop=True)
             .join(x.reset_index()[0].apply(pd.Series)))
       .rename(columns={0:'a_one', 1:'a_two'})
       .assign(number_of_b=lambda x: x.number_of_b.str.join(','))
    )[['a_one','a_two','b_list','number_of_b']]



df2
Out[123]: 
    a_one   a_two  b_list number_of_b
0  111111  222222       2       17,18
1  111111  333333       1          17
2  111111  555555       1          18
3  222222  333333       1          17
4  222222  555555       1          18
5  444444  222222       1          13

Upvotes: 1

Huang
Huang

Reputation: 609

I have a solution: First, make the combinations of a_id that have the same b_id:

from itertools import combinations
df = df.groupby("b_id").apply(lambda x: list(combinations(x["a_id"], 2))).apply(pd.Series).stack()

df now is:

 b_id
13    0    (444444, 222222)
17    0    (111111, 222222)
      1    (111111, 333333)
      2    (222222, 333333)
18    0    (111111, 222222)
      1    (111111, 555555)
      2    (222222, 555555)

Then split the Series, reset the index and concat the appearance of b_id:

df = df.apply(pd.Series).reset_index().groupby([0,1])["b_id"].apply(lambda x:x.values).reset_index()

Now we get:

        0       1      b_id
0  111111  222222  [17, 18]
1  111111  333333      [17]
2  111111  555555      [18]
3  222222  333333      [17]
4  222222  555555      [18]
5  444444  222222      [13]

This is almost what you need. And for the exact results:

df.columns = ["a_one", "a_two", "b_list"]
df["number_of_b"] = df.b_list.apply(len)

The final results:

    a_one   a_two    b_list  number_of_b
0  111111  222222  [17, 18]            2
1  111111  333333      [17]            1
2  111111  555555      [18]            1
3  222222  333333      [17]            1
4  222222  555555      [18]            1
5  444444  222222      [13]            1

The whole code for clarity:

from itertools import combinations
df = df.groupby("b_id").apply(lambda x: list(combinations(x["a_id"], 2))).apply(pd.Series).stack()
df = df.apply(pd.Series).reset_index().groupby([0,1])["b_id"].apply(lambda x:x.values).reset_index()
df.columns = ["a_one", "a_two", "b_list"]
df["number_of_b"] = df.b_list.apply(len)

This is not that fancy. Look forward to better solutions!

Upvotes: 1

Related Questions