Reputation: 169
I have a dataframe df with 70 columns. I am trying to calculate quantiles using df.quantile() function along axis = 1. Here are the details of the dataframe.
> print(df.head(4))
WS_653 WS_654 WS_655 WS_658 \
ts
2020-11-01 01:00:00 12.3708 11.7133 12.2125 12.3325
2020-11-01 01:10:00 12.6442 12.1883 12.5625 12.3233
2020-11-01 01:20:00 12.8042 11.7109 11.8765 12.1134
2020-11-01 01:30:00 12.3176 10.6824 11.8361 11.5672
WS_656 WS_657 WS_664 WS_659 \
ts
2020-11-01 01:00:00 12.0217 11.6233 12.6108 12.2458
2020-11-01 01:10:00 13.0342 12.5917 12.5225 11.7658
2020-11-01 01:20:00 11.6042 10.6496 11.8874 12.3613
2020-11-01 01:30:00 11.3118 9.98403 10.6 10.5992
WS_663 WS_666 ... WS_715 \
ts ...
2020-11-01 01:00:00 15.3058 15.1433 ... 12.9008
2020-11-01 01:10:00 15.3283 15.0625 ... 12.6042
2020-11-01 01:20:00 15.3765 15.058 ... 11.7462
2020-11-01 01:30:00 14.7689 14.4992 ... 11.0294
[4 rows x 70 columns]
> q10 = df.quantile(0.1, axis = 1)
> print(q10)
ts
2020-11-01 01:00:00 NaN
2020-11-01 01:10:00 NaN
2020-11-01 01:20:00 NaN
2020-11-01 01:30:00 NaN
2020-11-01 01:40:00 NaN
..
2020-12-01 00:00:00 NaN
2020-12-01 00:10:00 NaN
2020-12-01 00:20:00 NaN
2020-12-01 00:30:00 NaN
2020-12-01 00:40:00 NaN
Name: 0.1, Length: 4319, dtype: float64
However, if I loop through as:
> q10 = list()
> for k in range(len(df)):
q10.append(df.iloc[k,:].quantile(0.1))
> print(q10)
It prints a list of size len(df) with correct quantile values corresponding to each row. so want to understand why this works when I operate row-wise on the same df, but does not work on the entire dataframe.
Upvotes: 1
Views: 1024
Reputation: 16683
You have columns that are not float
data types.
You can index for columns that are only of data type 'float64'
cols = [col for col in df.columns if df[col].dtype == 'float64']
df[cols].astype(float).quantile(0.1, axis = 1)
sample output (second set of 4 rows in your question):
ts
2020-11-01 01:00:00 11.74282
2020-11-01 01:10:00 11.99281
2020-11-01 01:20:00 10.93598
2020-11-01 01:30:00 10.168581
Name: 0.1, dtype: float64
Alternatively, you can change object columns (with dtype 'O'
) to floats with pd.to_numeric()
. This will lead to different results, because you are forcing all columns to floats and returning NaN
for any values that are strings:
cols = [col for col in df.columns if df[col].dtype == 'O']
for col in cols:
df[col] = pd.to_numeric(df[col], errors='coerce')
df.quantile(0.1, axis = 1)
Upvotes: 4