Stack0ver
Stack0ver

Reputation: 57

change dataframe to another structure according id

My initial pandas dataframe looks as follows:

df = pd.DataFrame(np.array([[999,888,1,0], [988,899,2,0], [981,821,3,0],[721,789,4,1],[723,745,5,1],[444,432,6,2],[423,412,7,2],[423,455,8,2],[478,432,9,2]]),
                   columns=['a', 'b', 'c', 'id'])

>>>df
a    b   c id
999  888 1 0
988  899 2 0
981  821 3 0
721  789 4 1
723  745 5 1
444  432 6 2
423  412 7 2
423  455 8 2
478  432 9 2 

Now I want to create a new dataframe with the following structure:

0    1    2    3   4  5
999  988  888  899 1  2
721  723  789  745 4  5
444  423  432  412 6  7

So in fact I just want to get the first two values of each id and each column (a,b,c). Example: first row -> col a: 999, 988 + col b: 888, 899 + col c: 1, 2

Edit: Thanks to Correlian I use the following code:

size = 3


data = df.groupby('id').head(2).melt('id').sort_values(by=['id', 'variable'])['value']
out = pd.DataFrame(data.values.reshape((size, -1)))

Upvotes: 1

Views: 103

Answers (2)

user17626292
user17626292

Reputation:

This could be achived by the following:

pd.DataFrame(df.groupby("id").head(2).set_index('id').values.flatten().reshape((3,4)))

Edit

The new example would be:

pd.DataFrame(df.groupby("id").head(2).set_index('id').values.flatten().reshape((3,6)))[[0,1,3,4,2,5]]

With this method you should reorder the columns to get the output you want.

Upvotes: 1

Corralien
Corralien

Reputation: 120509

Compute value count of id to get the size (number of unique id, here 3) and the smallest count (here 2 for id=1). Group by id and keep the first count row for each group using head.

Then, use melt to flat your dataframe and sort it by id. Finally, reshape your dataframe according the size variable and create a new dataframe:

size, count = df.value_counts('id').agg(['size', 'min'])
data = df.groupby('id').head(count).melt('id').sort_values('id')['value']
out = pd.DataFrame(data.values.reshape((size, -1)))
print(out)

# Output:
     0    1    2    3
0  999  988  888  899
1  721  723  789  745
2  444  423  432  412

Update

I have changed my initial dataframe to 3 columns (a,b,c) see my initial question. How do I have to change your code?

size, count = df.value_counts('id').agg(['size', 'min'])
m = df.groupby('id').head(count).iloc[:, :-1].values
df = pd.DataFrame(np.hstack(m.T.reshape(-1, size, count)))
print(df)

# Output:
     0    1    2    3  4  5
0  999  988  888  899  1  2
1  721  723  789  745  4  5
2  444  423  432  412  6  7

Upvotes: 2

Related Questions