Balthazar
Balthazar

Reputation: 81

Pandas, why does division done to other rows lead to additional trailing zeroes on final row?

I have a table that shows participation in Hong Kong demonstrations by gender for different dates in 2019 (obtained from this source). The three first rows originally showed the percentage for males, females and unknown/unanswered. The final row shows the sample size. All data was initially of type string (the percentages included the % sign).

My DataFrame is titled gender_table

To be able to do some analysis, I first removed the percentage sign and changed the data to float type.

gender_table = gender_table.astype("float64")

This gives me the following:

Image of part of table after converting to floats

To change the percentage values into ratios, I thought I'd just divide all the data (except the final row with sample size) by 100.

gender_table[:-1] = gender_table[:-1]/100

gender_table now looks like this:

Image of part of table after diving top rows by 100

My question is this: Why has this operation added additional trailing zeroes to the sample size row?

Pastebin with data (after removal of % signs) available here (can be saved as .csv and read into a Pandas df ("index_col=0")).

Upvotes: 1

Views: 48

Answers (1)

Stef
Stef

Reputation: 30609

All rows of a column are formatted uniformly. The default format for a float variable x is f'{x:.6g}' (for details about format specifiers see here).
So when you divide the first rows that had 1 decimal place by 100 they get 3 decimal places and as all rows in a column share the same format 285.0 becomes 285.000.
This of course only changes the string representation of the values in the last row, the float values itself remain unchanged.

Upvotes: 1

Related Questions