JackoM
JackoM

Reputation: 724

Pandas Pivot_Table grouped values

I have a large data set on a CSV (Dataset). I want to create a pd.pivot_table to summarize the data by zip code, however, my data has lines that share the same zip code.

df = pd.read_csv('15zpallagi.csv')
df['A00100'] = df['A00100'].map('{:,.2f}'.format)
df.pivot_table(values='A00100', index='zipcode', aggfunc='sum')

When I run the code above to create a pivot_table, the value column includes several values, like if it was stacking on the sum with multiple values. output of pivot table

However if run the following code, I get the same values, but in an understandable format. results of pivot with columns

df.pivot_table(values='A00100', index='zipcode',columns='agi_stub', aggfunc='sum')

How can I create a pivot table that just adds the column A00100 and gives me a total by zip code?

Upvotes: 0

Views: 198

Answers (1)

Erik Z
Erik Z

Reputation: 412

You are likely seeing these inconsistencies because this line df['A00100'] = df['A00100'].map('{:,.2f}'.format) is converting your A00100 column to a string type instead of a float.

Comment out that second line and try again to see if that fixes the issue.

If you need to format the number to only show 2 decimals, do that after all of your transformations.

If you are rounding for some other reason (significant figures, etc.), use the Dataframe.round function instead of string formatting.

Upvotes: 2

Related Questions