Reputation: 7490
I have the following data:
import pandas as pd
# using the data dict at the bottom of the question
df_uplift_percentile = pd.DataFrame.from_dict(data, 'index')
df_uplift_percentile.index.name = 'percentile'
# display(df_uplift_percentile)
n_treatment n_control response_rate_treatment response_rate_control uplift std_treatment std_control std_uplift
percentile
0-10 217 983 0.041475 0.004069 0.037405 0.013535 0.002030 0.013687
10-20 145 1055 0.013793 0.000948 0.012845 0.009686 0.000947 0.009732
20-30 149 1051 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
30-40 383 817 0.010444 0.009792 0.000652 0.005195 0.003445 0.006233
40-50 354 846 0.005650 0.005910 -0.000260 0.003984 0.002635 0.004776
50-60 423 777 0.033097 0.029601 0.003496 0.008698 0.006080 0.010612
60-70 588 611 0.132653 0.155483 -0.022830 0.013988 0.014660 0.020263
70-80 673 526 0.178306 0.161597 0.016709 0.014755 0.016049 0.021801
80-90 881 318 0.155505 0.261006 -0.105501 0.012209 0.024628 0.027488
90-100 938 261 0.152452 0.333333 -0.180881 0.011737 0.029179 0.031451
I want to plot response_rate_treatment, response_rate_control, uplift by percentile (x axis) via a line chart with different color.
I am trying the below code. What mistake am I making that it is plotting a lot of charts instead of just 3 lines.
plt.figure(figsize=(20,15))
percentile = df_uplift_percentile.values
response_rate_treatment = df_uplift_percentile["response_rate_treatment"].values
response_rate_control = df_uplift_percentile["response_rate_control"].values
uplift= df_uplift_percentile["uplift"].values
plt.plot(percentile,response_rate_treatment,label= "Treatment Response Rate", color = 'green' )
plt.plot(percentile,response_rate_control,label = "Control Response Rate", color = 'yellow' )
plt.plot(percentile,uplift,label = "Uplift", color = 'red' )
plt.legend()
plt.ylabel("Uplift = Treatment Response Rate- Control Response Rate")
data =\
{'0-10': {'n_treatment': 217,
'n_control': 983,
'response_rate_treatment': 0.041475,
'response_rate_control': 0.004069,
'uplift': 0.037405,
'std_treatment': 0.013535,
'std_control': 0.00203,
'std_uplift': 0.013687},
'10-20': {'n_treatment': 145,
'n_control': 1055,
'response_rate_treatment': 0.013793,
'response_rate_control': 0.000948,
'uplift': 0.012845,
'std_treatment': 0.009686,
'std_control': 0.000947,
'std_uplift': 0.009732},
'20-30': {'n_treatment': 149,
'n_control': 1051,
'response_rate_treatment': 0.0,
'response_rate_control': 0.0,
'uplift': 0.0,
'std_treatment': 0.0,
'std_control': 0.0,
'std_uplift': 0.0},
'30-40': {'n_treatment': 383,
'n_control': 817,
'response_rate_treatment': 0.010444,
'response_rate_control': 0.009792,
'uplift': 0.000652,
'std_treatment': 0.005195,
'std_control': 0.003445,
'std_uplift': 0.006233},
'40-50': {'n_treatment': 354,
'n_control': 846,
'response_rate_treatment': 0.00565,
'response_rate_control': 0.00591,
'uplift': -0.00026,
'std_treatment': 0.003984,
'std_control': 0.002635,
'std_uplift': 0.004776},
'50-60': {'n_treatment': 423,
'n_control': 777,
'response_rate_treatment': 0.033097,
'response_rate_control': 0.029601,
'uplift': 0.003496,
'std_treatment': 0.008698,
'std_control': 0.00608,
'std_uplift': 0.010612},
'60-70': {'n_treatment': 588,
'n_control': 611,
'response_rate_treatment': 0.132653,
'response_rate_control': 0.155483,
'uplift': -0.02283,
'std_treatment': 0.013988,
'std_control': 0.01466,
'std_uplift': 0.020263},
'70-80': {'n_treatment': 673,
'n_control': 526,
'response_rate_treatment': 0.178306,
'response_rate_control': 0.161597,
'uplift': 0.016709,
'std_treatment': 0.014755,
'std_control': 0.016049,
'std_uplift': 0.021801},
'80-90': {'n_treatment': 881,
'n_control': 318,
'response_rate_treatment': 0.155505,
'response_rate_control': 0.261006,
'uplift': -0.105501,
'std_treatment': 0.012209,
'std_control': 0.024628,
'std_uplift': 0.027488},
'90-100': {'n_treatment': 938,
'n_control': 261,
'response_rate_treatment': 0.152452,
'response_rate_control': 0.333333,
'uplift': -0.180881,
'std_treatment': 0.011737,
'std_control': 0.029179,
'std_uplift': 0.031451}}
Upvotes: 2
Views: 2406
Reputation: 62403
pandas.DataFrame.plot
, which uses matplotlib
as the default backend
'percentile'
is already the index, so any selected columns will be plotted with the index as the x-axis.
'percentile'
where a column, it would be passed to .plot
as x='percentile'
and it's position would need to be added to .iloc
..iloc
to select the columns by index, or use .loc
to select the column by name.y
parameter. With long column names, it's shorter to use .iloc
.
df_uplift_percentile.plot(y=[...], ...)
to only plot certain columnsdf_uplift_percentile.plot(...)
to pass all columns to be plotted.python 3.8.12
, pandas 1.3.3
, matplotlib 3.4.3
ax = df_uplift_percentile.iloc[:, [2, 3, 4]].plot(xticks=range(len(df_uplift_percentile)), figsize=(10, 6), color=['green', 'yellow', 'r'],
ylabel='Uplift = Treatment Response Rate- Control Response Rate')
ax.legend(['Treatment Response Rate', 'Control Response Rate', 'Uplift'])
Upvotes: 3
Reputation: 744
Use:
percentile = df_uplift_percentile.index
instead of
percentile = df_uplift_percentile.values
Upvotes: 0