Reputation: 3900
I have a dataframe with 12 hourly data for over 10 years. All data are stored in date wise. I would like to extract columns containing the data from a specific month (note that month is not in standard 1, 2, 3, format). The rows of the 'date' column which I have looks like this:
01-May-07
02-May-07
.
.
.
31-Oct-17
How do I select only columns which contain data only for May, Jun etc.
Initially I thought that I can extract using df[df['DATE'].str.contains('May')]
. But it did not work as expected resulting in output as input.
Edit 1
DATE TIME MOONPH SPEED GUST CLOUD AMOUNT DRY WET DEW RH
01-May-07 230 NM7 6 0 4 27.4 25.4 25.4 86
01-May-07 330 NM7 4 0 4 27.4 25.4 25.4 86
01-May-07 430 NM7 3 0 4 27.4 25.4 25.4 86
01-May-07 530 NM7 2 0 4 27.4 25.4 25.4 89
01-May-07 630 NM7 3 0 5 27.4 26 25.4 85
01-May-07 700 NM7 0 0 4 27.8 26 25.4 81
01-May-07 730 NM7 0 0 4 27.8 26 25.4 81
01-May-07 800 NM7 2 0 4 27.8 26 25.4 81
01-May-07 830 NM7 5 0 4 29.2 26 24.6 76
01-May-07 900 NM7 5 0 4 29.2 26 24.6 76
01-May-07 930 NM7 5 0 2 29.8 26 24.6 76
01-May-07 1000 NM7 5 0 4 30.8 26 24.6 76
01-May-07 1030 NM7 5 0 4 30.8 26 24.6 76
01-May-07 1100 NM7 6 0 4 31.4 26 24.6 68
.
.
.
01-May-17 1630 NM7 8 0 5 32.6 27.4 25.6 68
01-May-17 1930 NM7 8 0 5 32 27.4 25.6 69
01-May-17 430 NM7 0 0 5 27.2 25 24 83
01-May-17 30 NM7 0 0 5 29.6 27.2 26.2 82
01-May-17 530 NM7 0 0 5 26.6 24.4 23.4 83
01-May-17 130 NM7 0 0 5 28 25.6 24.6 82
01-May-17 630 NM7 0 0 5 26.8 24.4 23.3 81
01-May-17 730 NM7 0 0 5 27.2 24.4 23.4 80
01-May-17 330 NM7 0 0 5 27.2 25 24 83
01-May-17 1230 NM7 10 0 5 32.8 28.2 25.2 64
01-May-17 2330 NM7 4 0 4 30 26.4 24.9 75
01-May-17 2230 NM7 5 0 4 30 26.8 25.5 77
01-May-17 2130 NM7 4 0 4 30 26.8 25.5 77
01-May-17 830 NM7 2 0 5 27.2 24.4 23.4 78
01-May-17 930 NM7 3 0 5 31.2 27.2 25.6 72
01-May-17 1830 NM7 8 0 5 32 27.4 25.6 69
01-May-17 1130 NM7 6 0 5 32.8 28.2 25.2 64
01-May-17 2030 NM7 6 0 4 32 26.8 25.4 76
01-May-17 1330 NM7 10 0 5 33 27.6 25.4 64
01-May-17 1430 NM7 10 0 5 33 27.6 25.2 65
Upvotes: 3
Views: 22366
Reputation: 451
This is how I would do. I would merge the Date and Time column with a space between them. Then keep the column name as DATE. The column will look something like this
DATE
01-May-07 230
01-May-07 330
01-May-07 430
...... Then I would covert the column to datetime object.
df['DATE'] = pd.to_datetime(df['DATE'], formate = '%d-%b-%y %H%M')
Then set the DATE column as index
df = df.set_index('DATE')
Then when the dataframe become time series, then in many ways you can extract a portion of your data based on time. For instance if I only want data from January
jan_data = df['2007-Jan']
first week of May
may_1st_week = df['2007-May-01':'2007-May-07']
And so on.
Upvotes: 3
Reputation: 862641
I think need convert to_datetime
and then compare with month
or strftime
and %B
for month names
:
df = pd.DataFrame({'DATE': ['01-May-07', '02-May-07', '31-Oct-17']})
print (df)
DATE
0 01-May-07
1 02-May-07
2 31-Oct-17
df = df[pd.to_datetime(df['DATE']).dt.month == 5]
df = df[pd.to_datetime(df['DATE']).dt.strftime('%B') == 'May']
print (df)
DATE
0 01-May-07
1 02-May-07
If need working with datetimes later:
df['DATE'] = pd.to_datetime(df['DATE'])
df = df[df['DATE'].dt.month == 5]
#df = df[df['DATE'].dt.strftime('%B') == 'May']
print (df)
DATE
0 2007-05-01
1 2007-05-02
EDIT:
If dont need working with datetimes
, for me with your data your approach working:
df = df[df['DATE'].str.contains('May')]
Upvotes: 16