Reputation: 3076
I use the current version of http://archive.ics.uci.edu/ml/datasets/Air+quality My issue is that I want to create a plot that is ordered by monthly aggregates of different features that plotted on multiple graphs
YearMonth Creation for X Axis
INPUT:
df['DateTime'] = df['Date'].astype(str) + ' ' + df['Time'].astype(str)
df['DateTime'] = pd.to_datetime(df['DateTime'], format='%m/%d/%Y %H:%M:%S')
print(df['DateTime'].iloc[:2])
OUTPUT:
0 2004-11-23 19:00:00
1 2004-11-23 20:00:00
Name: DateTime, dtype: datetime64[ns]
INPUT:
df['Date'] = pd.to_datetime(df['Date'].astype(str), format='%m/%d/%Y')
df['Year'] = df['DateTime'].map(lambda x: x.year)
print(df['Year'].iloc[:2])
OUTPUT:
0 2004
1 2004
Name: Year, dtype: int64
INPUT:
df['YearMonth'] = pd.to_datetime(df.DateTime).dt.to_period('m')
print(df['YearMonth'].iloc[:2])
OUTPUT:
0 2004-11
1 2004-11
Name: YearMonth, dtype: period[M]
Goal project has same results, format
My Plotting
plt.figure(figsize=(30,60))
#fig, axes = plt.subplots(1, 1, figsize=(30, 60), dpi=100)
gasList = ['CO_GT', 'C6H6_GT', 'Nox_GT', 'NO2_GT']
for i, col in enumerate(gasList, start=1):
plt.subplot(len(showList), 1, i)
sns.pointplot(x='YearMonth', y=col, hue='Year', data=df)
plt.title(col, y=0.5, loc='right')
#axes.set_xticks(year_month_day)
plt.show()
Ideal plotting
I am trying to achieve the same as this projects
Tried to do to Solve the problem
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9357 entries, 0 to 9356
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 9357 non-null datetime64[ns]
1 Time 9357 non-null object
2 CO_GT 9357 non-null float64
3 PT08_S1_CO 9357 non-null float64
4 C6H6_GT 9357 non-null float64
5 PT08_S2_NMHC 9357 non-null float64
6 Nox_GT 9357 non-null float64
7 PT08_S3_Nox 9357 non-null float64
8 NO2_GT 9357 non-null float64
9 PT08_S4_NO2 9357 non-null float64
10 PT08_S5_O3 9357 non-null float64
11 T 9357 non-null float64
12 RH 9357 non-null float64
13 AH 9357 non-null float64
14 DateTime 9357 non-null datetime64[ns]
15 Year 9357 non-null int64
16 YearMonth 9357 non-null period[M]
dtypes: datetime64[ns](2), float64(12), int64(1), object(1), period[M](1)
memory usage: 1.3+ MB
col_one_list = df['YearMonth'].tolist()
plt.figure(figsize=(30,60))
gasList = ['CO_GT', 'C6H6_GT', 'Nox_GT', 'NO2_GT']
for i, col in enumerate(gasList, start=1):
plt.subplot(len(showList), 1, i)
sns.pointplot(x='YearMonth', y=col, hue='Year', data=df, order = col_one_list )
plt.title(col, y=0.5, loc='right')
plt.show()
plt.figure(figsize=(30,60))
col_two_list = ['2004-03','2004-04', '2004-05', '2004-06', '2004-07', '2004-08', '2004-09', '2004-10', '2004-11','2004-12', '2005-01','2005-02','2005-03', '2005-04']
gasList = ['CO_GT', 'C6H6_GT', 'Nox_GT', 'NO2_GT']
for i, col in enumerate(gasList, start=1):
plt.subplot(len(showList), 1, i)
sns.pointplot(x='YearMonth', y=col, hue='Year', data=df, order = col_two_list )
plt.title(col, y=0.5, loc='right')
plt.show()
Upvotes: 3
Views: 2601
Reputation: 30971
When you generate your pointplot, pass sorted DataFrame (by YearMonth) and the printout should be just as you wish.
Without the above sort the picture is as you presented (wrong).
I prepared a test input file, for just 2 columns, as follows:
DateTime CO_GT C6H6_GT
2004-11-01 2.7 12.4
2004-12-01 2.6 10.6
2004-10-01 3.0 13.8
2005-01-01 2.0 9.0
2005-02-01 2.2 8.0
2004-03-01 2.2 10.0
2004-09-01 2.2 12.0
2005-03-01 2.0 8.6
2004-04-01 2.1 10.2
2004-05-01 1.95 10.5
2004-06-01 1.85 10.4
2004-07-01 1.7 10.5
2005-04-01 1.3 4.5
2004-08-01 1.4 6.8
Then I read it, converting DateTime column to datetime type (as early as possible, i.e. just on reading):
df = pd.read_fwf('Input.csv', widths=[12, 7, 7], parse_dates=[0])
The first step is to create "auxiliary" columns:
df['Year'] = df.DateTime.dt.year
df['YearMonth'] = df.DateTime.dt.to_period('m')
And to generate the picture, I ran:
gasList = ['CO_GT', 'C6H6_GT']
plt.figure(figsize=(14, 8))
for i, col in enumerate(gasList, start=1):
plt.subplot(len(gasList), 1, i)
sns.pointplot(x='YearMonth', y=col, hue='Year', data=df.sort_values('DateTime'))
plt.title(col, y=0.5, loc='right')
plt.show()
The result is:
Upvotes: 3