Reputation: 55
I have the following dataframe: <class 'pandas.core.frame.DataFrame'>
RangeIndex: 1642 entries, 0 to 1641
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1642 non-null datetime64[ns]
1 Volgnr 1642 non-null int64
2 account 1642 non-null object
3 Rentedatum 1642 non-null datetime64[ns]
4 Bedrag 1642 non-null float64
5 Balance 1642 non-null float64
6 tegenrekening 906 non-null object
7 Code 1642 non-null object
8 Naam tegenpartij 1642 non-null object
9 description 1642 non-null object
10 category 1642 non-null object
11 Grootboek 1578 non-null object
12 Kleinboek 1578 non-null object
dtypes: datetime64[ns](2), float64(2), int64(1), object(8)
memory usage: 166.9+ KB
'account' has 5 different account numbers which like so: NL00ABCD0123456789
I want two different graphs but I'm already stuck with the first one i.e. I want to see the balance over time for the 5 accounts
In line with other question on this forum I tried:
pd.options.plotting.backend="plotly"
df.set_index('Date', inplace=True)
df.groupby('account')['balance'].plot(legend=True)
But got the following error:
TypeError: line() got an unexpected keyword argument 'legend'
What is going wrong here?
For later: If that is solved I want the X-axis to be weeks or months instead of the absolute date so some aggregation will be necessary
Upvotes: 2
Views: 2175
Reputation: 61104
You're seeing this error because running df.plot()
will trigger px.line()
after defining pd.options.plotting.backend="plotly"
.
And px.line()
does not have a legend
atribute. But you don't need it.
All you need is:
px.line(df, x = 'Date', y = 'Balance', color = 'Account')
And you'll get:
Setting pd.options.plotting.backend="plotly"
will, as you surely may know, override the default plotting backend for pandas
which is matplotlib
. Still, when running help(df.plot())
after that, the help info that pops up still seems to be info about matplotlib
, which does in fact have a legend
attribute.
But px.line()
is what will be triggered by df.plot()
after instantiating pd.options.plotting.backend="plotly"
. And this is what triggers your error, since px.line
does not have legend
attribute. Dut don't worry about that, since things are about to get really simple for you, because px.line()
will produce a grouped legend for you. You don't even need to group your data as long as you apply df.plot()
correctly.
But before we come to that, we'll have to take a look at your provided dataset. Given the wording of your question, and the look of the 'data' what you've provided, my understanding is that you've got several non-unique accounts under account
associated with different values for balance
spread across multiple non-unique dates. Something like this:
Date Account Balance
0 01.01.2022 NL00ABCD0123456789 1
1 01.01.2022 NL00ABCD0123456790 2
2 01.01.2022 NL00ABCD0123456791 2
3 01.01.2022 NL00ABCD0123456792 3
4 01.01.2022 NL00ABCD0123456793 4
5 02.01.2022 NL00ABCD0123456789 2
6 02.01.2022 NL00ABCD0123456790 3
7 02.01.2022 NL00ABCD0123456791 3
8 02.01.2022 NL00ABCD0123456792 4
9 02.01.2022 NL00ABCD0123456793 5
If that's the case, then all you need to do is run:
px.line(df, x = 'Date', y = 'Balance', color = 'Account')
import pandas as pd
import plotly.express as px
pd.options.plotting.backend="plotly"
df = pd.DataFrame({'Date': {0: '01.01.2022',
1: '01.01.2022',
2: '01.01.2022',
3: '01.01.2022',
4: '01.01.2022',
5: '02.01.2022',
6: '02.01.2022',
7: '02.01.2022',
8: '02.01.2022',
9: '02.01.2022',
10: '03.01.2022',
11: '03.01.2022',
12: '03.01.2022',
13: '03.01.2022',
14: '03.01.2022',
15: '04.01.2022',
16: '04.01.2022',
17: '04.01.2022',
18: '04.01.2022',
19: '04.01.2022'},
'Account': {0: 'NL00ABCD0123456789',
1: 'NL00ABCD0123456790',
2: 'NL00ABCD0123456791',
3: 'NL00ABCD0123456792',
4: 'NL00ABCD0123456793',
5: 'NL00ABCD0123456789',
6: 'NL00ABCD0123456790',
7: 'NL00ABCD0123456791',
8: 'NL00ABCD0123456792',
9: 'NL00ABCD0123456793',
10: 'NL00ABCD0123456789',
11: 'NL00ABCD0123456790',
12: 'NL00ABCD0123456791',
13: 'NL00ABCD0123456792',
14: 'NL00ABCD0123456793',
15: 'NL00ABCD0123456789',
16: 'NL00ABCD0123456790',
17: 'NL00ABCD0123456791',
18: 'NL00ABCD0123456792',
19: 'NL00ABCD0123456793'},
'Balance': {0: 1,
1: 2,
2: 2,
3: 3,
4: 4,
5: 2,
6: 3,
7: 3,
8: 4,
9: 5,
10: 3,
11: 4,
12: 4,
13: 5,
14: 6,
15: 4,
16: 5,
17: 5,
18: 6,
19: 7}})
px.line(df, x = 'Date', y = 'Balance', color = 'Account')
Upvotes: 3
Reputation: 2605
Since you have not provided with a sample data, I have solution with an arbitrary time-series data.
{'Date': ['10/03/2004',
'10/03/2004',
'10/03/2004',
'10/03/2004',
'10/03/2004'],
'Time': ['18.00.00', '19.00.00', '20.00.00', '21.00.00', '22.00.00'],
'CO(GT)': ['2,6', '2', '2,2', '2,2', '1,6'],
'PT08.S1(CO)': [1360.0, 1292.0, 1402.0, 1376.0, 1272.0],
'NMHC(GT)': [150.0, 112.0, 88.0, 80.0, 51.0],
'C6H6(GT)': ['11,9', '9,4', '9,0', '9,2', '6,5'],
'PT08.S2(NMHC)': [1046.0, 955.0, 939.0, 948.0, 836.0],
'NOx(GT)': [166.0, 103.0, 131.0, 172.0, 131.0],
'PT08.S3(NOx)': [1056.0, 1174.0, 1140.0, 1092.0, 1205.0],
'NO2(GT)': [113.0, 92.0, 114.0, 122.0, 116.0],
'PT08.S4(NO2)': [1692.0, 1559.0, 1555.0, 1584.0, 1490.0],
'PT08.S5(O3)': [1268.0, 972.0, 1074.0, 1203.0, 1110.0],
'T': ['13,6', '13,3', '11,9', '11,0', '11,2'],
'RH': ['48,9', '47,7', '54,0', '60,0', '59,6'],
'AH': ['0,7578', '0,7255', '0,7502', '0,7867', '0,7888']
}
We need to convert date to a datetime
object.
df['Date'] = pd.to_datetime(df['Date'] + " " + df['Time'], format="%d/%m/%Y %H.%M.%S")
# To plot with monthly aggregation you can use resample.
df.set_index('Date').resample('1M').mean().plot()
Upvotes: 0