Reputation: 57
My initial pandas dataframe looks as follows:
df = pd.DataFrame(np.array([[999,888,1,0], [988,899,2,0], [981,821,3,0],[721,789,4,1],[723,745,5,1],[444,432,6,2],[423,412,7,2],[423,455,8,2],[478,432,9,2]]),
columns=['a', 'b', 'c', 'id'])
>>>df
a b c id
999 888 1 0
988 899 2 0
981 821 3 0
721 789 4 1
723 745 5 1
444 432 6 2
423 412 7 2
423 455 8 2
478 432 9 2
Now I want to create a new dataframe with the following structure:
0 1 2 3 4 5
999 988 888 899 1 2
721 723 789 745 4 5
444 423 432 412 6 7
So in fact I just want to get the first two values of each id and each column (a,b,c). Example: first row -> col a: 999, 988 + col b: 888, 899 + col c: 1, 2
Edit: Thanks to Correlian I use the following code:
size = 3
data = df.groupby('id').head(2).melt('id').sort_values(by=['id', 'variable'])['value']
out = pd.DataFrame(data.values.reshape((size, -1)))
Upvotes: 1
Views: 103
Reputation:
This could be achived by the following:
pd.DataFrame(df.groupby("id").head(2).set_index('id').values.flatten().reshape((3,4)))
Edit
The new example would be:
pd.DataFrame(df.groupby("id").head(2).set_index('id').values.flatten().reshape((3,6)))[[0,1,3,4,2,5]]
With this method you should reorder the columns to get the output you want.
Upvotes: 1
Reputation: 120509
Compute value count of id
to get the size (number of unique id, here 3) and the smallest count (here 2 for id=1). Group by id
and keep the first count
row for each group using head
.
Then, use melt
to flat your dataframe and sort it by id
. Finally, reshape your dataframe according the size
variable and create a new dataframe:
size, count = df.value_counts('id').agg(['size', 'min'])
data = df.groupby('id').head(count).melt('id').sort_values('id')['value']
out = pd.DataFrame(data.values.reshape((size, -1)))
print(out)
# Output:
0 1 2 3
0 999 988 888 899
1 721 723 789 745
2 444 423 432 412
Update
I have changed my initial dataframe to 3 columns (a,b,c) see my initial question. How do I have to change your code?
size, count = df.value_counts('id').agg(['size', 'min'])
m = df.groupby('id').head(count).iloc[:, :-1].values
df = pd.DataFrame(np.hstack(m.T.reshape(-1, size, count)))
print(df)
# Output:
0 1 2 3 4 5
0 999 988 888 899 1 2
1 721 723 789 745 4 5
2 444 423 432 412 6 7
Upvotes: 2