Serg
Serg

Reputation: 128

Pandas groupby issue after melt bug?

Python version 3.8.12
Pandas 1.4.1

Given the following dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'id': [1000] * 4,
    'date': ['2022-01-01'] * 4,
    'ts': pd.date_range('2022-01-01', freq='5M', periods=4),
    'A': np.random.randint(1, 6, size=4),
    'B': np.random.rand(4)
})

That looks like this:

id date ts A B
0 1000 2022-01-01 2022-01-01 00:00:00 4 0.98019
1 1000 2022-01-01 2022-01-01 00:05:00 3 0.82021
2 1000 2022-01-01 2022-01-01 00:10:00 4 0.549684
3 1000 2022-01-01 2022-01-01 00:15:00 5 0.0818311

I transposed the columns A and B with pandas melt:

melted = df.melt(
    id_vars=['id', 'date', 'ts'],
    value_vars=['A', 'B'],
    var_name='label',
    value_name='value',
    ignore_index=True
)

That looks like this:

id date ts label value
0 1000 2022-01-01 2022-01-01 00:00:00 A 4
1 1000 2022-01-01 2022-01-01 00:05:00 A 3
2 1000 2022-01-01 2022-01-01 00:10:00 A 4
3 1000 2022-01-01 2022-01-01 00:15:00 A 5
4 1000 2022-01-01 2022-01-01 00:00:00 B 0.98019
5 1000 2022-01-01 2022-01-01 00:05:00 B 0.82021
6 1000 2022-01-01 2022-01-01 00:10:00 B 0.549684
7 1000 2022-01-01 2022-01-01 00:15:00 B 0.0818311

Then I groupby and select the first group:

melted.groupby(['id', 'date']).first()

That gives me this:

                        ts label  value
id   date                              
1000 2022-01-01 2022-01-01     A    4.0

But I would expect this output instead:

                                 ts  A         B
id   date                                       
1000 2022-01-01 2022-01-01 00:00:00  4  0.980190
     2022-01-01 2022-01-01 00:05:00  3  0.820210
     2022-01-01 2022-01-01 00:10:00  4  0.549684
     2022-01-01 2022-01-01 00:15:00  5  0.081831

What am I not getting? Or this is a bug? Also why the ts columns is converted to a date?

Upvotes: 1

Views: 104

Answers (1)

Serg
Serg

Reputation: 128

I thought first will get the first group but instead it will get the first element for each group, as stated in the documentation for the aggregation functions of pandas.

To select the first group, I needed to use get_group function.

Upvotes: 1

Related Questions