Reputation: 7490

How to plot multiple lines from a dataframe

I have the following data:

import pandas as pd

# using the data dict at the bottom of the question
df_uplift_percentile = pd.DataFrame.from_dict(data, 'index')
df_uplift_percentile.index.name = 'percentile'

# display(df_uplift_percentile)
            n_treatment  n_control  response_rate_treatment  response_rate_control    uplift  std_treatment  std_control  std_uplift
percentile                                                                                                                          
0-10                217        983                 0.041475               0.004069  0.037405       0.013535     0.002030    0.013687
10-20               145       1055                 0.013793               0.000948  0.012845       0.009686     0.000947    0.009732
20-30               149       1051                 0.000000               0.000000  0.000000       0.000000     0.000000    0.000000
30-40               383        817                 0.010444               0.009792  0.000652       0.005195     0.003445    0.006233
40-50               354        846                 0.005650               0.005910 -0.000260       0.003984     0.002635    0.004776
50-60               423        777                 0.033097               0.029601  0.003496       0.008698     0.006080    0.010612
60-70               588        611                 0.132653               0.155483 -0.022830       0.013988     0.014660    0.020263
70-80               673        526                 0.178306               0.161597  0.016709       0.014755     0.016049    0.021801
80-90               881        318                 0.155505               0.261006 -0.105501       0.012209     0.024628    0.027488
90-100              938        261                 0.152452               0.333333 -0.180881       0.011737     0.029179    0.031451

I want to plot response_rate_treatment, response_rate_control, uplift by percentile (x axis) via a line chart with different color.

I am trying the below code. What mistake am I making that it is plotting a lot of charts instead of just 3 lines.

plt.figure(figsize=(20,15))


percentile = df_uplift_percentile.values

response_rate_treatment = df_uplift_percentile["response_rate_treatment"].values

response_rate_control = df_uplift_percentile["response_rate_control"].values

uplift= df_uplift_percentile["uplift"].values

plt.plot(percentile,response_rate_treatment,label= "Treatment Response Rate", color = 'green' )
plt.plot(percentile,response_rate_control,label = "Control Response Rate", color = 'yellow' )
plt.plot(percentile,uplift,label = "Uplift", color = 'red' )

plt.legend()
plt.ylabel("Uplift = Treatment Response Rate- Control Response Rate")

Current Plot Result

Reproducible Data

Data dict

data =\
{'0-10': {'n_treatment': 217,
  'n_control': 983,
  'response_rate_treatment': 0.041475,
  'response_rate_control': 0.004069,
  'uplift': 0.037405,
  'std_treatment': 0.013535,
  'std_control': 0.00203,
  'std_uplift': 0.013687},
 '10-20': {'n_treatment': 145,
  'n_control': 1055,
  'response_rate_treatment': 0.013793,
  'response_rate_control': 0.000948,
  'uplift': 0.012845,
  'std_treatment': 0.009686,
  'std_control': 0.000947,
  'std_uplift': 0.009732},
 '20-30': {'n_treatment': 149,
  'n_control': 1051,
  'response_rate_treatment': 0.0,
  'response_rate_control': 0.0,
  'uplift': 0.0,
  'std_treatment': 0.0,
  'std_control': 0.0,
  'std_uplift': 0.0},
 '30-40': {'n_treatment': 383,
  'n_control': 817,
  'response_rate_treatment': 0.010444,
  'response_rate_control': 0.009792,
  'uplift': 0.000652,
  'std_treatment': 0.005195,
  'std_control': 0.003445,
  'std_uplift': 0.006233},
 '40-50': {'n_treatment': 354,
  'n_control': 846,
  'response_rate_treatment': 0.00565,
  'response_rate_control': 0.00591,
  'uplift': -0.00026,
  'std_treatment': 0.003984,
  'std_control': 0.002635,
  'std_uplift': 0.004776},
 '50-60': {'n_treatment': 423,
  'n_control': 777,
  'response_rate_treatment': 0.033097,
  'response_rate_control': 0.029601,
  'uplift': 0.003496,
  'std_treatment': 0.008698,
  'std_control': 0.00608,
  'std_uplift': 0.010612},
 '60-70': {'n_treatment': 588,
  'n_control': 611,
  'response_rate_treatment': 0.132653,
  'response_rate_control': 0.155483,
  'uplift': -0.02283,
  'std_treatment': 0.013988,
  'std_control': 0.01466,
  'std_uplift': 0.020263},
 '70-80': {'n_treatment': 673,
  'n_control': 526,
  'response_rate_treatment': 0.178306,
  'response_rate_control': 0.161597,
  'uplift': 0.016709,
  'std_treatment': 0.014755,
  'std_control': 0.016049,
  'std_uplift': 0.021801},
 '80-90': {'n_treatment': 881,
  'n_control': 318,
  'response_rate_treatment': 0.155505,
  'response_rate_control': 0.261006,
  'uplift': -0.105501,
  'std_treatment': 0.012209,
  'std_control': 0.024628,
  'std_uplift': 0.027488},
 '90-100': {'n_treatment': 938,
  'n_control': 261,
  'response_rate_treatment': 0.152452,
  'response_rate_control': 0.333333,
  'uplift': -0.180881,
  'std_treatment': 0.011737,
  'std_control': 0.029179,
  'std_uplift': 0.031451}}

Upvotes: 2

Answers (2)

Trenton McKinney

Reputation: 62403

The correct way to plot many columns as lines, is to use pandas.DataFrame.plot, which uses matplotlib as the default backend
- This reduces your plotting code from 10 lines to 2 lines.
'percentile' is already the index, so any selected columns will be plotted with the index as the x-axis.
- If 'percentile' where a column, it would be passed to .plot as x='percentile' and it's position would need to be added to .iloc.
- Use .iloc to select the columns by index, or use .loc to select the column by name.
- Alternatively, pass the column names to the y parameter. With long column names, it's shorter to use .iloc.
  - df_uplift_percentile.plot(y=[...], ...) to only plot certain columns
  - df_uplift_percentile.plot(...) to pass all columns to be plotted.
Changing the labels for the legend can be accomplished in two ways
1. Use .rename to change the column names
2. Use .legend and pass a list for the labels (shown below)
Tested in python 3.8.12, pandas 1.3.3, matplotlib 3.4.3

ax = df_uplift_percentile.iloc[:, [2, 3, 4]].plot(xticks=range(len(df_uplift_percentile)), figsize=(10, 6), color=['green', 'yellow', 'r'],
                                                  ylabel='Uplift = Treatment Response Rate- Control Response Rate')
ax.legend(['Treatment Response Rate', 'Control Response Rate', 'Uplift'])

Upvotes: 3

the.real.gruycho

Reputation: 744

Use:

percentile = df_uplift_percentile.index

instead of

percentile = df_uplift_percentile.values

Upvotes: 0

How to plot multiple lines from a dataframe

Current Plot Result

Reproducible Data

Answers (2)

Related Questions