Reputation: 724
I have a large data set on a CSV (Dataset). I want to create a pd.pivot_table to summarize the data by zip code, however, my data has lines that share the same zip code.
df = pd.read_csv('15zpallagi.csv')
df['A00100'] = df['A00100'].map('{:,.2f}'.format)
df.pivot_table(values='A00100', index='zipcode', aggfunc='sum')
When I run the code above to create a pivot_table, the value column includes several values, like if it was stacking on the sum with multiple values.
However if run the following code, I get the same values, but in an understandable format.
df.pivot_table(values='A00100', index='zipcode',columns='agi_stub', aggfunc='sum')
How can I create a pivot table that just adds the column A00100 and gives me a total by zip code?
Upvotes: 0
Views: 198
Reputation: 412
You are likely seeing these inconsistencies because this line df['A00100'] = df['A00100'].map('{:,.2f}'.format)
is converting your A00100 column to a string type instead of a float.
Comment out that second line and try again to see if that fixes the issue.
If you need to format the number to only show 2 decimals, do that after all of your transformations.
If you are rounding for some other reason (significant figures, etc.), use the Dataframe.round function instead of string formatting.
Upvotes: 2