Reputation: 28
How can you add a row to a dataframe in pandas with interval indices in both the column and index?
I am trying to define the total counts in a crosstable in pandas defined as:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(500,2))
bins = np.linspace(0, 1, 11)
df['bins_0'] = pd.cut(df[0], bins=bins)
df['bins_1'] = pd.cut(df[1], bins=bins)
# Define crosstable
ct = pd.crosstab(df['bins_0'], df['bins_1'])
# define sumrow
sumrow = ct.sum(axis=0)
My question is: how can you include a row in such a table?
What I tried so far:
Using '.loc' to add a row does not work.
ct.loc['total'] = sumrow
results in
TypeError: cannot append a non-category item to a CategoricalIndex
Trying to reset the index also did not work in this case, since using 'ct.reset_index()'
resulted in TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category
In the end I would also add a column with the sums per row, but I suppose that would require the same procedure as adding a row.
Upvotes: 1
Views: 459
Reputation: 862431
First idea is use margins
in crosstab
:
ct = pd.crosstab(df['bins_0'], df['bins_1'], margins=True, margins_name='Total')
Your solution - if need new column Total
add categories to columns and axis=1
to sum
:
ct = pd.crosstab(df['bins_0'], df['bins_1'])
ct.columns = ct.columns.add_categories('Total')
ct['Total'] = ct.sum(axis=1)
Or add DataFrame.loc
if need new Total
row:
ct = pd.crosstab(df['bins_0'], df['bins_1'])
ct.index= ct.index.add_categories('Total')
ct.loc['Total'] = ct.sum(axis=0)
Upvotes: 1