Pandas, why does division done to other rows lead to additional trailing zeroes on final row?

Question

I have a table that shows participation in Hong Kong demonstrations by gender for different dates in 2019 (obtained from this source). The three first rows originally showed the percentage for males, females and unknown/unanswered. The final row shows the sample size. All data was initially of type string (the percentages included the % sign).

My DataFrame is titled gender_table

To be able to do some analysis, I first removed the percentage sign and changed the data to float type.

gender_table = gender_table.astype("float64")

This gives me the following:

To change the percentage values into ratios, I thought I'd just divide all the data (except the final row with sample size) by 100.

gender_table[:-1] = gender_table[:-1]/100

gender_table now looks like this:

My question is this: Why has this operation added additional trailing zeroes to the sample size row?

Pastebin with data (after removal of % signs) available here (can be saved as .csv and read into a Pandas df ("index_col=0")).

Stef · Accepted Answer

All rows of a column are formatted uniformly. The default format for a float variable x is f'{x:.6g}' (for details about format specifiers see here).
So when you divide the first rows that had 1 decimal place by 100 they get 3 decimal places and as all rows in a column share the same format 285.0 becomes 285.000.
This of course only changes the string representation of the values in the last row, the float values itself remain unchanged.

Pandas, why does division done to other rows lead to additional trailing zeroes on final row?

Answers (1)

Related Questions