Anthony Petruzzio
Anthony Petruzzio

Reputation: 25

pandas question: Remove missing values by column

I have a dataframe called teams. Each column is a team in the NFL, each row is how much a given fan would pay to attend a team's game. Looks like:

team1 team2 team3
40 NaN 50
NaN NaN 80
75 30 NaN

I want to compare the standard deviations of each column, so obviously I need to remove the NaNs. I want to do this column-wise though, so that I don't just remove all rows where one value is NaN because I'll lose a lot of data. What's the best way to do this? I have a lot of columns, otherwise I would just make a numpy array representing each column.

Upvotes: 0

Views: 90

Answers (2)

chitown88
chitown88

Reputation: 28620

Using pandas' .describe(), it shoul already account for any Nans:

import pandas as pd
import numpy as np

columns = ['team1', 'team2',    'team3']
data = [
        [40, np.nan,    50],
        [np.nan, np.nan,    80],
        [75,    30, np.nan]]



df = pd.DataFrame(data=data, columns=columns)
std = df.describe().loc['std']

Output:

print(std)
team1    24.748737
team2          NaN
team3    21.213203

Upvotes: 0

mozway
mozway

Reputation: 261015

Your assumption is incorrect.

I want to compare the standard deviations of each column, so obviously I need to remove the NaNs

By default std ignores the NaN (skipna=True), so just use:

df.std()

Output:

team1    24.748737
team2          NaN
team3    21.213203
dtype: float64

Upvotes: 1

Related Questions