RDJ
RDJ

Reputation: 4122

pandas 'as_index' function doesn't work as expected

This is a minimum reproducible example of my original dataframe called 'calls':

       phone_number    call_outcome   agent  call_number
0      83473306392   NOT INTERESTED  orange            0
1     762850680150  CALL BACK LATER  orange            1
2     476309275079   NOT INTERESTED  orange            2
3     899921761538  CALL BACK LATER     red            3
4     906739234066  CALL BACK LATER  orange            4

Writing this pandas command...

most_calls = calls.groupby('agent') \
.count().sort('call_number', ascending=False)

Returns this...

           phone_number  call_outcome  call_number
agent                                          
orange          2234          2234         2234
red             1478          1478         1478
black            750           750          750
green            339           339          339
blue             199           199          199

Which is correct, but for the fact that I want 'agent' to be a variable and not indexed.

I've used the as_index=False function on numerous occasions and am familiar with specifying axis=1. However in this instance it doesn't matter where or how I incorporate these parameters, every permutation returns an error.

These are some examples I've tried and the corresponding errors:

most_calls = calls.groupby('agent', as_index=False) \
.count().sort('call_number', ascending=False)

ValueError: invalid literal for long() with base 10: 'black'

And

most_calls = calls.groupby('agent', as_index=False, axis=1) \
.count().sort('call_number', ascending=False)

ValueError: as_index=False only valid for axis=0

Upvotes: 5

Views: 6326

Answers (2)

Ami Tavory
Ami Tavory

Reputation: 76297

I believe that, irrespective of the groupby operation you've done, you just need to call reset_index to say that the index column should just be a regular column.

Starting with a mockup of your data:

import pandas as pd
calls = pd.DataFrame({
    'agent': ['orange', 'red'],
    'phone_number': [2234, 1478],
    'call_outcome': [2234, 1478],
})
>> calls
    agent   call_outcome    phone_number
0   orange  2234    2234
1   red     1478    1478

here is the operation you did with reset_index() appended:

>> calls.groupby('agent').count().sort('phone_number', ascending=False).reset_index()
    agent   call_outcome    phone_number
0   orange  1   1
1   red     1   1

Upvotes: 4

Jianxun Li
Jianxun Li

Reputation: 24742

Use reset_index to move index to a normal column.

calls.groupby('agent').count().sort('call_number', ascending=False).reset_index()

Out[117]: 
      agent  phone_number  call_outcome  call_number
0    orange             4             4            4
1       red             1             1            1

Upvotes: 1

Related Questions