Reputation: 69
I encountered a very strange (and frustrating) issue with Pandas. I want to divide each cell in the dataframe by the sum of the column. I have already googled and used the answer suggested but it doesn't work - the contents of each row returns the SAME VALUE.
dfs = pd.DataFrame(np.random.randint(0,10,size=(3,3)), columns=['A','B','C'])
# Now here is the copied solution from google
dfs = dfs.div(dfs.sum(axis=0),axis=1)
So for easy examples like above it works very well. But the moment I tried it on my dataframe, which has 1080 columns, every row has the same value.
I have made sure to drop all nan, inf, or anything other than numbers, and the dtype for all the columns is float64. I am not sure why this is happening, could anyone give me some ideas what is wrong? I have a feeling that it is because of the size of the dataframe? But surely 1080 columns and 8 rows shouldn't be too much for Pandas to handle?
Thanks in advance
Edit: Yes, run this code to get the first 2 columns of my dataframe.
dfs = pd.DataFrame({'7006091':[2.219749271,2.15577658,1.857604216,
1.588101736,
0.925926932,
1.413871811,
1.528702513,
1.313778722
],'7007772':[2.21238513,
2.148624672,
1.851441511,
1.582833121,
0.922855119,
1.409181214,
1.523630958,
1.309420189
]})
I just tried dfs.update as suggested and it didn't work either. This is what was returned with:
dfs.update(dfs.div(dfs.sum(axis=0),axis=1))
Upvotes: 0
Views: 2049
Reputation: 323316
The reason why you have the same output , since your columns have the same distribution ,check out
dfs['7006091']/dfs['7007772']
0 1.003329
1 1.003329
2 1.003329
3 1.003329
4 1.003329
5 1.003329
6 1.003329
7 1.003329
dtype: float64
So they are just same value after we standarlized with column sum
Upvotes: 0