Reputation: 93
I need to calculate various percentiles along rows into 1 column in a Pandas dataframe. For example:
df['P90'] = df[['col1','col2','col3','col4','col5']].apply(quantile(0.9), axis=1)
df['P50'] = df[['col1','col2','col3','col4','col5']].apply(quantile(0.5), axis=1)
I have the following dataframe:
ID 2019/31 2019/32 2019/33 2019/34 2019/35 2019/36 2019/37 2019/38 2019/39 2019/40
258101 67000
258102 56750 19105 35990 41250 44425 51275 1071 8125 16375
258103 8528 6853 3291 3000 5640 11248
258104 27532 19523 12092 7933 8675 435 1045 5115 1450
258105 40000 285500 16500
I need the output in the following format:
ID 2019/31 2019/32 2019/33 2019/34 2019/35 2019/36 2019/37 2019/38 2019/39 2019/40 P_50 P_90
258101 67000 x1 x2
258102 56750 19105 35990 41250 44425 51275 1071 8125 16375 x3 x4
258103 8528 6853 3291 3000 5640 11248 x5 x6
258104 27532 19523 12092 7933 8675 435 1045 5115 1450 x7 x8
258105 40000 285500 16500 x9 x10
I've tried the following:
cols = ['2019/31', '2019/32', '2019/33', '2019/34', '2019/35', '2019/36', '2019/37', '2019/38', '2019/39', '2019/40']
df['P_50'] = df[cols].apply(np.median, axis=1)
df['P_50'] = df[cols].apply(np.quantile(0.5), axis=1)
perc99 = np.vectorize(lambda x: np.percentile(x, 50))
df['P_50'] = perc99(df[cols].values)
None of it is giving the desired output.
Upvotes: 0
Views: 227
Reputation: 1095
I think, this will work:
df['P_50'] = np.median(df.values, axis=1)
df['P_90'] = np.quantile(df.values, 0.9, axis=1)
Upvotes: 0
Reputation: 171
I scraped your df from the posting. Is this what your are trying to do?:
df['P_50'] = df.iloc[:,1:-1].median(axis=1)
df['P_50'] = df.iloc[:,1:-1].quantile(0.5, axis=1)
this excluding the ID column
Upvotes: 1