Two identical code samples give different results

Question

This is a sample data frame df:

TIME  VP_1  VP_2  VP_3   EVAL
20    3242  3244  3245   0
24    3244  3244  3242   0
30    3456  3244  3456   1
33    3456  3245  3242   0
45    3242  3456  3245   1

I am calculating an average TIME per VP_* when EVAL is equal to 0 and 1.

This is a sample output for VP equal to 3242.

VP     EVAL   AVG_TIME
3242   0      25.67
3242   1      45

The problem is that I get different results when applying the following two identical codes on my real dataset. I cannot understand why this happens and which approach (of these two) is correct.

Code #1

grouped = (pd.melt(df, id_vars=['EVAL', 'TIME'], value_name='VP')
 .drop('variable', axis=1).drop_duplicates()
 .groupby(['EVAL', 'VP']).agg({'TIME' : 'mean'})
 .reset_index())

Code #2

cols = ['VP', 'TIME', 'EVAL']
grouped = pd.melt(
    df, ['TIME', 'EVAL'],
    ['VP_1', 'VP_2', 'VP_3'],
    value_name='VP')[cols]
ab = grouped.groupby(['EVAL','VP']).agg({'TIME' : 'mean'}).reset_index()

jezrael · Accepted Answer

There is difference with drop_duplicates:

drop('variable', axis=1) is same as [cols] - both remove column variable

.drop_duplicates()

So row 6 and 12 is removed because duplicates:

grouped = pd.melt(df, id_vars=['EVAL', 'TIME'], value_name='VP')
            .drop('variable', axis=1).drop_duplicates()
print (grouped)
    EVAL  TIME    VP
0      0    20  3242
1      0    24  3244
2      1    30  3456
3      0    33  3456
4      1    45  3242
5      0    20  3244
7      1    30  3244
8      0    33  3245
9      1    45  3456
10     0    20  3245
11     0    24  3242
13     0    33  3242
14     1    45  3245

cols = ['VP', 'TIME', 'EVAL']
grouped = pd.melt(df, ['TIME', 'EVAL'], ['VP_1', 'VP_2', 'VP_3'], value_name='VP')[cols]
print (grouped)
      VP  TIME  EVAL
0   3242    20     0
1   3244    24     0
2   3456    30     1
3   3456    33     0
4   3242    45     1
5   3244    20     0
6   3244    24     0
7   3244    30     1
8   3245    33     0
9   3456    45     1
10  3245    20     0
11  3242    24     0
12  3456    30     1
13  3242    33     0
14  3245    45     1

Two identical code samples give different results

Answers (1)

Related Questions