Reputation: 25
I have a dataframe called teams. Each column is a team in the NFL, each row is how much a given fan would pay to attend a team's game. Looks like:
team1 | team2 | team3 |
---|---|---|
40 | NaN | 50 |
NaN | NaN | 80 |
75 | 30 | NaN |
I want to compare the standard deviations of each column, so obviously I need to remove the NaNs. I want to do this column-wise though, so that I don't just remove all rows where one value is NaN because I'll lose a lot of data. What's the best way to do this? I have a lot of columns, otherwise I would just make a numpy array representing each column.
Upvotes: 0
Views: 90
Reputation: 28620
Using pandas
' .describe(), it shoul already account for any Nans:
import pandas as pd
import numpy as np
columns = ['team1', 'team2', 'team3']
data = [
[40, np.nan, 50],
[np.nan, np.nan, 80],
[75, 30, np.nan]]
df = pd.DataFrame(data=data, columns=columns)
std = df.describe().loc['std']
Output:
print(std)
team1 24.748737
team2 NaN
team3 21.213203
Upvotes: 0