Jack Aidley
Jack Aidley

Reputation: 20107

Why am I getting an empty row in my dataframe after using pandas apply?

I'm fairly new to Python and Pandas and trying to figure out how to do a simple split-join-apply. The problem I am having is that I am getting an blank row at the top of all the dataframes I'm getting back from Pandas' apply function and I'm not sure why. Can anyone explain?

The following is a minimal example that demonstrates the problem, not my actual code:

sorbet = pd.DataFrame({
  'flavour': ['orange', 'orange', 'lemon', 'lemon'],
  'niceosity' : [4, 5, 7, 8]})

def calc_vals(df, target) :
    return pd.Series({'total' : df[target].count(), 'mean' : df[target].mean()})

sorbet_grouped = sorbet.groupby('flavour')
sorbet_vals = sorbet_grouped.apply(calc_vals, target='niceosity')

if I then do print(sorted_vals) I get this output:

         mean  total
flavour                 <--- Why are there spaces here?
lemon     7.5      2
orange    4.5      2

[2 rows x 2 columns]

Compare this with print(sorbet):

  flavour  niceosity     <--- Note how column names line up
0  orange          4
1  orange          5
2   lemon          7
3   lemon          8

[4 rows x 2 columns]

What is causing this discrepancy and how can I fix it?

Upvotes: 5

Views: 3752

Answers (1)

unutbu
unutbu

Reputation: 879501

The groupby/apply operation returns is a new DataFrame, with a named index. The name corresponds to the column name by which the original DataFrame was grouped.

The name shows up above the index. If you reset it to None, then that row disappears:

In [155]: sorbet_vals.index.name = None

In [156]: sorbet_vals
Out[156]: 
        mean  total
lemon    7.5      2
orange   4.5      2

[2 rows x 2 columns]

Note that the name is useful -- I don't really recommend removing it. The name allows you to refer to that index by name rather than merely by number.


If you wish the index to be a column, use reset_index:

In [209]: sorbet_vals.reset_index(inplace=True); sorbet_vals
Out[209]: 
  flavour  mean  total
0   lemon   7.5      2
1  orange   4.5      2

[2 rows x 3 columns]

Upvotes: 12

Related Questions