BarendVO
BarendVO

Reputation: 28

How to add a row to a pandas dataframe with interval indices

How can you add a row to a dataframe in pandas with interval indices in both the column and index?

I am trying to define the total counts in a crosstable in pandas defined as:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(500,2))
bins = np.linspace(0, 1, 11)
df['bins_0'] = pd.cut(df[0], bins=bins)
df['bins_1'] = pd.cut(df[1], bins=bins)
# Define crosstable
ct = pd.crosstab(df['bins_0'], df['bins_1'])
# define sumrow
sumrow = ct.sum(axis=0)

My question is: how can you include a row in such a table?

What I tried so far:

Using '.loc' to add a row does not work.

ct.loc['total'] = sumrow

results in TypeError: cannot append a non-category item to a CategoricalIndex

Trying to reset the index also did not work in this case, since using 'ct.reset_index()' resulted in TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

In the end I would also add a column with the sums per row, but I suppose that would require the same procedure as adding a row.

Upvotes: 1

Views: 459

Answers (1)

jezrael
jezrael

Reputation: 862431

First idea is use margins in crosstab:

ct = pd.crosstab(df['bins_0'], df['bins_1'], margins=True, margins_name='Total')

Your solution - if need new column Total add categories to columns and axis=1 to sum:

ct = pd.crosstab(df['bins_0'], df['bins_1'])
ct.columns = ct.columns.add_categories('Total')
ct['Total'] = ct.sum(axis=1)

Or add DataFrame.loc if need new Total row:

ct = pd.crosstab(df['bins_0'], df['bins_1'])
ct.index= ct.index.add_categories('Total')
ct.loc['Total'] = ct.sum(axis=0)

Upvotes: 1

Related Questions