matje59
matje59

Reputation: 55

Pandas: How to plot multiple lines against date using plotly as backend?

I have the following dataframe: <class 'pandas.core.frame.DataFrame'>

RangeIndex: 1642 entries, 0 to 1641
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Date              1642 non-null   datetime64[ns]
 1   Volgnr            1642 non-null   int64         
 2   account           1642 non-null   object        
 3   Rentedatum        1642 non-null   datetime64[ns]
 4   Bedrag            1642 non-null   float64       
 5   Balance           1642 non-null   float64       
 6   tegenrekening     906 non-null    object        
 7   Code              1642 non-null   object        
 8   Naam tegenpartij  1642 non-null   object        
 9   description       1642 non-null   object        
 10  category          1642 non-null   object        
 11  Grootboek         1578 non-null   object        
 12  Kleinboek         1578 non-null   object        
dtypes: datetime64[ns](2), float64(2), int64(1), object(8)
memory usage: 166.9+ KB

'account' has 5 different account numbers which like so: NL00ABCD0123456789

I want two different graphs but I'm already stuck with the first one i.e. I want to see the balance over time for the 5 accounts

In line with other question on this forum I tried:

pd.options.plotting.backend="plotly"
df.set_index('Date', inplace=True)
df.groupby('account')['balance'].plot(legend=True)

But got the following error:

TypeError: line() got an unexpected keyword argument 'legend'

What is going wrong here?

For later: If that is solved I want the X-axis to be weeks or months instead of the absolute date so some aggregation will be necessary

Upvotes: 2

Views: 2175

Answers (2)

vestland
vestland

Reputation: 61104

Short answer:

You're seeing this error because running df.plot() will trigger px.line() after defining pd.options.plotting.backend="plotly". And px.line() does not have a legend atribute. But you don't need it. All you need is:

px.line(df, x = 'Date', y = 'Balance', color = 'Account')

And you'll get:

enter image description here

The details:

Setting pd.options.plotting.backend="plotly" will, as you surely may know, override the default plotting backend for pandas which is matplotlib. Still, when running help(df.plot()) after that, the help info that pops up still seems to be info about matplotlib, which does in fact have a legend attribute.

But px.line() is what will be triggered by df.plot() after instantiating pd.options.plotting.backend="plotly". And this is what triggers your error, since px.line does not have legend attribute. Dut don't worry about that, since things are about to get really simple for you, because px.line() will produce a grouped legend for you. You don't even need to group your data as long as you apply df.plot() correctly.

But before we come to that, we'll have to take a look at your provided dataset. Given the wording of your question, and the look of the 'data' what you've provided, my understanding is that you've got several non-unique accounts under account associated with different values for balance spread across multiple non-unique dates. Something like this:

          Date             Account  Balance
0   01.01.2022  NL00ABCD0123456789        1
1   01.01.2022  NL00ABCD0123456790        2
2   01.01.2022  NL00ABCD0123456791        2
3   01.01.2022  NL00ABCD0123456792        3
4   01.01.2022  NL00ABCD0123456793        4
5   02.01.2022  NL00ABCD0123456789        2
6   02.01.2022  NL00ABCD0123456790        3
7   02.01.2022  NL00ABCD0123456791        3
8   02.01.2022  NL00ABCD0123456792        4
9   02.01.2022  NL00ABCD0123456793        5

If that's the case, then all you need to do is run:

px.line(df, x = 'Date', y = 'Balance', color = 'Account')

Plot:

enter image description here

Complete code:

import pandas as pd
import plotly.express as px

pd.options.plotting.backend="plotly"
df = pd.DataFrame({'Date': {0: '01.01.2022',
              1: '01.01.2022',
              2: '01.01.2022',
              3: '01.01.2022',
              4: '01.01.2022',
              5: '02.01.2022',
              6: '02.01.2022',
              7: '02.01.2022',
              8: '02.01.2022',
              9: '02.01.2022',
              10: '03.01.2022',
              11: '03.01.2022',
              12: '03.01.2022',
              13: '03.01.2022',
              14: '03.01.2022',
              15: '04.01.2022',
              16: '04.01.2022',
              17: '04.01.2022',
              18: '04.01.2022',
              19: '04.01.2022'},
             'Account': {0: 'NL00ABCD0123456789',
              1: 'NL00ABCD0123456790',
              2: 'NL00ABCD0123456791',
              3: 'NL00ABCD0123456792',
              4: 'NL00ABCD0123456793',
              5: 'NL00ABCD0123456789',
              6: 'NL00ABCD0123456790',
              7: 'NL00ABCD0123456791',
              8: 'NL00ABCD0123456792',
              9: 'NL00ABCD0123456793',
              10: 'NL00ABCD0123456789',
              11: 'NL00ABCD0123456790',
              12: 'NL00ABCD0123456791',
              13: 'NL00ABCD0123456792',
              14: 'NL00ABCD0123456793',
              15: 'NL00ABCD0123456789',
              16: 'NL00ABCD0123456790',
              17: 'NL00ABCD0123456791',
              18: 'NL00ABCD0123456792',
              19: 'NL00ABCD0123456793'},
             'Balance': {0: 1,
              1: 2,
              2: 2,
              3: 3,
              4: 4,
              5: 2,
              6: 3,
              7: 3,
              8: 4,
              9: 5,
              10: 3,
              11: 4,
              12: 4,
              13: 5,
              14: 6,
              15: 4,
              16: 5,
              17: 5,
              18: 6,
              19: 7}})

px.line(df, x = 'Date', y = 'Balance', color = 'Account')

Upvotes: 3

Irfanuddin
Irfanuddin

Reputation: 2605

Since you have not provided with a sample data, I have solution with an arbitrary time-series data.

{'Date': ['10/03/2004',
  '10/03/2004',
  '10/03/2004',
  '10/03/2004',
  '10/03/2004'],
 'Time': ['18.00.00', '19.00.00', '20.00.00', '21.00.00', '22.00.00'],
 'CO(GT)': ['2,6', '2', '2,2', '2,2', '1,6'],
 'PT08.S1(CO)': [1360.0, 1292.0, 1402.0, 1376.0, 1272.0],
 'NMHC(GT)': [150.0, 112.0, 88.0, 80.0, 51.0],
 'C6H6(GT)': ['11,9', '9,4', '9,0', '9,2', '6,5'],
 'PT08.S2(NMHC)': [1046.0, 955.0, 939.0, 948.0, 836.0],
 'NOx(GT)': [166.0, 103.0, 131.0, 172.0, 131.0],
 'PT08.S3(NOx)': [1056.0, 1174.0, 1140.0, 1092.0, 1205.0],
 'NO2(GT)': [113.0, 92.0, 114.0, 122.0, 116.0],
 'PT08.S4(NO2)': [1692.0, 1559.0, 1555.0, 1584.0, 1490.0],
 'PT08.S5(O3)': [1268.0, 972.0, 1074.0, 1203.0, 1110.0],
 'T': ['13,6', '13,3', '11,9', '11,0', '11,2'],
 'RH': ['48,9', '47,7', '54,0', '60,0', '59,6'],
 'AH': ['0,7578', '0,7255', '0,7502', '0,7867', '0,7888']
}

We need to convert date to a datetime object.


df['Date'] = pd.to_datetime(df['Date'] + " " + df['Time'], format="%d/%m/%Y %H.%M.%S")

# To plot with monthly aggregation you can use resample. 
df.set_index('Date').resample('1M').mean().plot()

enter image description here

Upvotes: 0

Related Questions