df.quantile(axis = 1) throws NaN

Question

I have a dataframe df with 70 columns. I am trying to calculate quantiles using df.quantile() function along axis = 1. Here are the details of the dataframe.

> print(df.head(4))

                      WS_653         WS_654        WS_655       WS_658  \
ts                                                                            
2020-11-01 01:00:00       12.3708       11.7133       12.2125       12.3325   
2020-11-01 01:10:00       12.6442       12.1883       12.5625       12.3233   
2020-11-01 01:20:00       12.8042       11.7109       11.8765       12.1134   
2020-11-01 01:30:00       12.3176       10.6824       11.8361       11.5672   

                          WS_656         WS_657       WS_664        WS_659  \
ts                                                                            
2020-11-01 01:00:00       12.0217       11.6233       12.6108       12.2458   
2020-11-01 01:10:00       13.0342       12.5917       12.5225       11.7658   
2020-11-01 01:20:00       11.6042       10.6496       11.8874       12.3613   
2020-11-01 01:30:00       11.3118       9.98403          10.6       10.5992   

                          WS_663         WS_666  ...       WS_715  \
ts                                               ...                 
2020-11-01 01:00:00       15.3058       15.1433  ...       12.9008   
2020-11-01 01:10:00       15.3283       15.0625  ...       12.6042   
2020-11-01 01:20:00       15.3765        15.058  ...       11.7462   
2020-11-01 01:30:00       14.7689       14.4992  ...       11.0294   

[4 rows x 70 columns]

> q10 = df.quantile(0.1, axis = 1)
> print(q10)

ts
2020-11-01 01:00:00   NaN
2020-11-01 01:10:00   NaN
2020-11-01 01:20:00   NaN
2020-11-01 01:30:00   NaN
2020-11-01 01:40:00   NaN
                       ..
2020-12-01 00:00:00   NaN
2020-12-01 00:10:00   NaN
2020-12-01 00:20:00   NaN
2020-12-01 00:30:00   NaN
2020-12-01 00:40:00   NaN
Name: 0.1, Length: 4319, dtype: float64

However, if I loop through as:

> q10 = list()

> for k in range(len(df)):
       q10.append(df.iloc[k,:].quantile(0.1))

> print(q10)

It prints a list of size len(df) with correct quantile values corresponding to each row. so want to understand why this works when I operate row-wise on the same df, but does not work on the entire dataframe.

David Erickson · Accepted Answer

You have columns that are not float data types.

You can index for columns that are only of data type 'float64'

cols  = [col for col in df.columns if df[col].dtype == 'float64']
df[cols].astype(float).quantile(0.1, axis = 1)

sample output (second set of 4 rows in your question):

ts
2020-11-01 01:00:00    11.74282
2020-11-01 01:10:00    11.99281
2020-11-01 01:20:00    10.93598
2020-11-01 01:30:00   10.168581
Name: 0.1, dtype: float64

Alternatively, you can change object columns (with dtype 'O') to floats with pd.to_numeric(). This will lead to different results, because you are forcing all columns to floats and returning NaN for any values that are strings:

cols  = [col for col in df.columns if df[col].dtype == 'O']
for col in cols:
    df[col] = pd.to_numeric(df[col], errors='coerce')
df.quantile(0.1, axis = 1)

df.quantile(axis = 1) throws NaN

Answers (1)

Related Questions