Slurm
Slurm

Reputation: 57

I am unable to set the xticks of my lineplot in Seaborn to the values of the coresponding hour

I tried a lot of diffent methods but I can't get a reasonable xtick labeling. This is the code I wrote.

import pandas as pd
import numpy as np
import matplotlib
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

#Line of Code just for importing the .csv Data
df = pd.read_csv('path of the csv file', sep=",", comment='#', decimal='.', parse_dates=True)

xticks = df.time.unique()


table = df.pivot_table("globalpower", index="time", aggfunc=np.mean)

graph = sns.lineplot(df.time, df.globalpower, data=df)
graph.set_xticks(range(0,24))
graph.set_xticklabels(['01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00','24:00' ])

I know there should be a more elegant way to list the Times of the day.

The output looks like this:

This is my current output

I printed the head of my data it looks like this:

Unnamed:    0      date      time  globalpower  voltage  globintensity  submetering1  submetering2  submetering3
0     1600236  1/1/2010  00:00:00        1.790   240.65            7.4           0.0           0.0          18.0
1     1600237  1/1/2010  00:01:00        1.780   240.07            7.4           0.0           0.0          18.0
2     1600238  1/1/2010  00:02:00        1.780   240.15            7.4           0.0           0.0          19.0
3     1600239  1/1/2010  00:03:00        1.746   240.26            7.2           0.0           0.0          18.0
4     1600240  1/1/2010  00:04:00        1.686   240.12            7.0           0.0           0.0          18.0

Upvotes: 2

Views: 5551

Answers (2)

Tom
Tom

Reputation: 8790

Only a little to add onto Andrea's answer, just to explain what I think was going on in your original code. Here's toy data with minute-precision time strings and random values:

In[0]:

import pandas as pd
import numpy as np
import seaborn as sns

times = []
for h in range(24):
    for m in range(60):
        times.append('{0}:{1}:00'.format(f'{h:02}',f'{m:02}'))

values = np.random.rand(1440*3)    #1400 minutes in a day

df = pd.DataFrame({'time':times*3,
                    'globalpower':values,})

df

Out[0]:
          time  globalpower
0     00:00:00     0.564812
1     00:01:00     0.429477
2     00:02:00     0.827994
3     00:03:00     0.525569
4     00:04:00     0.113478
       ...          ...
7195  23:55:00     0.624546
7196  23:56:00     0.981141
7197  23:57:00     0.096928
7198  23:58:00     0.170131
7199  23:59:00     0.398853

[7200 rows x 2 columns]

Note that I repeat each time 3x so that sns.lineplot has something to average for each unique time. Graphing this data with your code creates the same error you described:

graph = sns.lineplot(df.time, df.globalpower, data=df)
graph.set_xticks(range(0,24))
graph.set_xticklabels(['01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00','24:00'])

enter image description here

The basic discrepancy is that neither your plotting function nor your x-axis arguments are aware that there is any time information. When you call sns.lineplot with x=df.time and y=df.globalpower, seaborn basically does a groupby operation on the time column for each unique entry and averages the global power values. But it is only seeing unique strings in the time column, these unique strings are sorted when plotted, which just happens to match the order of times in a day because of how they are written alphanumerically.

To see this, consider that instead using an array of non-time-formatted strings (e.g. '0000', '0001', '0002', etc...) will result in the same graph:

names = []
for h in range(24):
    for m in range(60):
        names.append(str(f'{h:02}') + str(f'{m:02}'))
#names = ['0001','0002','0003',...]

df2 = pd.DataFrame({'name':names*3,
                   'globalpower':values,})

graph2 = sns.lineplot(df2.name, df2.globalpower, data=df)
graph2.set_xticks(range(0,24))
graph2.set_xticklabels(['01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00','24:00'])

So when you get to your tick arguments, saying set_xticks(range(0,24)) and set_xticklabels(['01:00','02:00','03:00'...]) means basically "set ticks at positions 0 through 23 with these 24 labels", though the plot is graphing (in this case) 1440 unique x-values, so 0-23 only spans a sliver of the values.

The fix is basically what Andrea answered: get your time information into a datetime format, and then use matplotlib.dates to format the ticks. For your strings of times (without dates), you can simply do:

df['time'] = pd.to_datetime(df['time'])

And then follow their answer. This will give every time a full timestamp on January 1st, 1970 (what is default in pandas); but the weird year doesn't matter if you only care about plotting a 24-hour period averaged for each recurring time.

Upvotes: 2

Zephyr
Zephyr

Reputation: 12496

Since I do not have access to your data, I created fake one in order to have some data to work with. You can just use your df.
Check this code:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

N = 1440
time = pd.date_range('2020-01-01', periods = N, freq = 'min')
globalpower = np.random.randn(N)
df = pd.DataFrame({'time': time,
                   'globalpower': globalpower})

graph = sns.lineplot(df.time, df.globalpower, data = df)
graph.xaxis.set_major_locator(mdates.HourLocator(interval = 1))
graph.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
plt.xticks(rotation = 90)

plt.show()

which gives me this plot:

enter image description here

You can adjust the x axis ticks and labels with:

  • graph.xaxis.set_major_locator(mdates.HourLocator(interval = 1)) to set ticks each hours
  • graph.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M')) to set the format of the x axis label to "hours:minutes"
  • plt.xticks(rotation = 90) to rotate by 90 degrees the x axis labels in order to improve the visualization

Upvotes: 5

Related Questions