imxitiz
imxitiz

Reputation: 3987

How to create a dataframe on the basis of duplicate values

I have a DataFrame, somethings like this:

df=pd.DataFrame({"a":[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],"b":[5,4,8,9,2,3,4,1,9,5,6,7,8,6,1,8]})
printf(df)
'''
    a  b
0   1  5
1   1  4
2   1  8
3   1  9
4   2  2
5   2  3
6   2  4
7   2  1
8   3  9
9   3  5
10  3  6
11  3  7
12  4  8
13  4  6
14  4  1
15  4  8
'''

And now I want to create a new DataFrame on the basis of that repeating value of "a" and values of "b", which would look like this:

   1  2  3  4
0  5  2  9  8
1  4  3  5  6
2  8  4  6  1
3  9  1  7  8 

I had tried direct approach like

pd.DataFrame(df.groupby(["a"]))

But it is giving me two column

    0   1
0   1   a b 0 1 5 1 1 4 2 1 8 3 1 9
1   2   a b 4 2 2 5 2 3 6 2 4 7 2 1
2   3   a b 8 3 9 9 3 5 10 3 6 11 3 7
3   4   a b 12 4 8 13 4 6 14 4 1 15 4 8

Upvotes: 1

Views: 168

Answers (1)

Dejene T.
Dejene T.

Reputation: 989

cumcount() each group of a and transpose using pivot()

(df.assign(idx=df.groupby('a').cumcount())
   .pivot(index='idx', columns='a', values='b')
).reset_index(drop=True)

The output looks like this:

a   1   2   3   4
0   5   2   9   8
1   4   3   5   6
2   8   4   6   1
3   9   1   7   8

If you may be confused about how it works, you can separate each function like

df1 = df.assing(idx=df.groupby("a").cumcount())
df2 = df1.pivot(index='idx', columns='a', values='b').reset_index(drop=True)

It returns the same result

Upvotes: 1

Related Questions