jovicbg
jovicbg

Reputation: 1553

Index columns disappeared after lambda function in Pandas

I wanted to calculate the percent of some object in one hour ('Time') so I have tried to write a lambda function, and I think it does the job, but index columns disappeared, columns that dataframe is grouped by.

df = df.groupby(['id', 'name', 'time', 'object', 'type'], as_index=True, sort=False)['col1', 'col2', 'col3', 'col4', 'col5'].apply(lambda x: x * 100 / 3600).reset_index()

After that code I print df.columns and got this:

Index([u'index', u'col1', col2', u'col3',
       u'col4', u'col5'],
      dtype='object')

If there is a need I am going to write some table with values for each column. Thanks in advance.

Upvotes: 3

Views: 1404

Answers (4)

BENY
BENY

Reputation: 323306

Data from Jpp

df[['col1','col2']]*=100/3600
df
Out[110]: 
       col1      col2  id name
0  0.138889  0.250000   1    A
1  0.166667  0.111111   2    B
2  0.222222  0.138889   1    A

Upvotes: 1

Ami Tavory
Ami Tavory

Reputation: 76307

Moving the loop outward, will make the code run significantly faster:

for c in ['col1', 'col2', 'col3', 'col4', 'col5']:
    df[c] *= 100. / 3600

This is because the individual loops' calculations will be done in a vectorized way.

This also won't modify the index in any way.

Upvotes: 3

JE_Muc
JE_Muc

Reputation: 5774

You apply .reset_index() which resets the index. Take a look at the pandas documentation and you'll see, that .reset_index() transfers the index to the columns.

Upvotes: 1

jpp
jpp

Reputation: 164693

pd.DataFrame.groupby is used to aggregate data, not to apply a function to multiple columns.

For simple functions, you should look for a vectorised solution. For example:

# set up simple dataframe
df = pd.DataFrame({'id': [1, 2, 1], 'name': ['A', 'B', 'A'],
                   'col1': [5, 6, 8], 'col2': [9, 4, 5]})

# apply logic in a vectorised way on multiple columns
df[['col1', 'col2']] = df[['col1', 'col2']].values * 100 / 3600

If you wish to set your index as multiple columns, and are keen to use pd.DataFrame.apply, this is possible as two separate steps. For example:

df = df.set_index(['id', 'name'])
df[['col1', 'col2']] = df[['col1', 'col2']].apply(lambda x: x * 100 / 3600)

Upvotes: 2

Related Questions