Reputation: 57
I tried a lot of diffent methods but I can't get a reasonable xtick labeling. This is the code I wrote.
import pandas as pd
import numpy as np
import matplotlib
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
#Line of Code just for importing the .csv Data
df = pd.read_csv('path of the csv file', sep=",", comment='#', decimal='.', parse_dates=True)
xticks = df.time.unique()
table = df.pivot_table("globalpower", index="time", aggfunc=np.mean)
graph = sns.lineplot(df.time, df.globalpower, data=df)
graph.set_xticks(range(0,24))
graph.set_xticklabels(['01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00','24:00' ])
I know there should be a more elegant way to list the Times of the day.
The output looks like this:
I printed the head of my data it looks like this:
Unnamed: 0 date time globalpower voltage globintensity submetering1 submetering2 submetering3
0 1600236 1/1/2010 00:00:00 1.790 240.65 7.4 0.0 0.0 18.0
1 1600237 1/1/2010 00:01:00 1.780 240.07 7.4 0.0 0.0 18.0
2 1600238 1/1/2010 00:02:00 1.780 240.15 7.4 0.0 0.0 19.0
3 1600239 1/1/2010 00:03:00 1.746 240.26 7.2 0.0 0.0 18.0
4 1600240 1/1/2010 00:04:00 1.686 240.12 7.0 0.0 0.0 18.0
Upvotes: 2
Views: 5551
Reputation: 8790
Only a little to add onto Andrea's answer, just to explain what I think was going on in your original code. Here's toy data with minute-precision time strings and random values:
In[0]:
import pandas as pd
import numpy as np
import seaborn as sns
times = []
for h in range(24):
for m in range(60):
times.append('{0}:{1}:00'.format(f'{h:02}',f'{m:02}'))
values = np.random.rand(1440*3) #1400 minutes in a day
df = pd.DataFrame({'time':times*3,
'globalpower':values,})
df
Out[0]:
time globalpower
0 00:00:00 0.564812
1 00:01:00 0.429477
2 00:02:00 0.827994
3 00:03:00 0.525569
4 00:04:00 0.113478
... ...
7195 23:55:00 0.624546
7196 23:56:00 0.981141
7197 23:57:00 0.096928
7198 23:58:00 0.170131
7199 23:59:00 0.398853
[7200 rows x 2 columns]
Note that I repeat each time 3x so that sns.lineplot
has something to average for each unique time. Graphing this data with your code creates the same error you described:
graph = sns.lineplot(df.time, df.globalpower, data=df)
graph.set_xticks(range(0,24))
graph.set_xticklabels(['01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00','24:00'])
The basic discrepancy is that neither your plotting function nor your x-axis arguments are aware that there is any time information. When you call sns.lineplot
with x=df.time
and y=df.globalpower
, seaborn
basically does a groupby operation on the time column for each unique entry and averages the global power values. But it is only seeing unique strings in the time column, these unique strings are sorted when plotted, which just happens to match the order of times in a day because of how they are written alphanumerically.
To see this, consider that instead using an array of non-time-formatted strings (e.g. '0000', '0001', '0002', etc...) will result in the same graph:
names = []
for h in range(24):
for m in range(60):
names.append(str(f'{h:02}') + str(f'{m:02}'))
#names = ['0001','0002','0003',...]
df2 = pd.DataFrame({'name':names*3,
'globalpower':values,})
graph2 = sns.lineplot(df2.name, df2.globalpower, data=df)
graph2.set_xticks(range(0,24))
graph2.set_xticklabels(['01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00','24:00'])
So when you get to your tick arguments, saying set_xticks(range(0,24))
and set_xticklabels(['01:00','02:00','03:00'...])
means basically "set ticks at positions 0 through 23 with these 24 labels", though the plot is graphing (in this case) 1440 unique x-values, so 0-23 only spans a sliver of the values.
The fix is basically what Andrea answered: get your time information into a datetime
format, and then use matplotlib.dates
to format the ticks. For your strings of times (without dates), you can simply do:
df['time'] = pd.to_datetime(df['time'])
And then follow their answer. This will give every time a full timestamp on January 1st, 1970 (what is default in pandas
); but the weird year doesn't matter if you only care about plotting a 24-hour period averaged for each recurring time.
Upvotes: 2
Reputation: 12496
Since I do not have access to your data, I created fake one in order to have some data to work with. You can just use your df
.
Check this code:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
N = 1440
time = pd.date_range('2020-01-01', periods = N, freq = 'min')
globalpower = np.random.randn(N)
df = pd.DataFrame({'time': time,
'globalpower': globalpower})
graph = sns.lineplot(df.time, df.globalpower, data = df)
graph.xaxis.set_major_locator(mdates.HourLocator(interval = 1))
graph.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
plt.xticks(rotation = 90)
plt.show()
which gives me this plot:
You can adjust the x axis ticks and labels with:
graph.xaxis.set_major_locator(mdates.HourLocator(interval = 1))
to set ticks each hoursgraph.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
to set the format of the x axis label to "hours:minutes"plt.xticks(rotation = 90)
to rotate by 90 degrees the x axis labels in order to improve the visualizationUpvotes: 5