sundar_ima
sundar_ima

Reputation: 3900

Select rows for a specific month in Pandas

I have a dataframe with 12 hourly data for over 10 years. All data are stored in date wise. I would like to extract columns containing the data from a specific month (note that month is not in standard 1, 2, 3, format). The rows of the 'date' column which I have looks like this:

01-May-07
02-May-07
.
.
.
31-Oct-17

How do I select only columns which contain data only for May, Jun etc.

Initially I thought that I can extract using df[df['DATE'].str.contains('May')]. But it did not work as expected resulting in output as input.

Edit 1

DATE    TIME    MOONPH  SPEED   GUST    CLOUD AMOUNT    DRY WET DEW RH
01-May-07   230 NM7 6   0   4   27.4    25.4    25.4    86
01-May-07   330 NM7 4   0   4   27.4    25.4    25.4    86
01-May-07   430 NM7 3   0   4   27.4    25.4    25.4    86
01-May-07   530 NM7 2   0   4   27.4    25.4    25.4    89
01-May-07   630 NM7 3   0   5   27.4    26  25.4    85
01-May-07   700 NM7 0   0   4   27.8    26  25.4    81
01-May-07   730 NM7 0   0   4   27.8    26  25.4    81
01-May-07   800 NM7 2   0   4   27.8    26  25.4    81
01-May-07   830 NM7 5   0   4   29.2    26  24.6    76
01-May-07   900 NM7 5   0   4   29.2    26  24.6    76
01-May-07   930 NM7 5   0   2   29.8    26  24.6    76
01-May-07   1000    NM7 5   0   4   30.8    26  24.6    76
01-May-07   1030    NM7 5   0   4   30.8    26  24.6    76
01-May-07   1100    NM7 6   0   4   31.4    26  24.6    68
.
.
.
01-May-17   1630    NM7 8   0   5   32.6    27.4    25.6    68
01-May-17   1930    NM7 8   0   5   32  27.4    25.6    69
01-May-17   430 NM7 0   0   5   27.2    25  24  83
01-May-17   30  NM7 0   0   5   29.6    27.2    26.2    82
01-May-17   530 NM7 0   0   5   26.6    24.4    23.4    83
01-May-17   130 NM7 0   0   5   28  25.6    24.6    82
01-May-17   630 NM7 0   0   5   26.8    24.4    23.3    81
01-May-17   730 NM7 0   0   5   27.2    24.4    23.4    80
01-May-17   330 NM7 0   0   5   27.2    25  24  83
01-May-17   1230    NM7 10  0   5   32.8    28.2    25.2    64
01-May-17   2330    NM7 4   0   4   30  26.4    24.9    75
01-May-17   2230    NM7 5   0   4   30  26.8    25.5    77
01-May-17   2130    NM7 4   0   4   30  26.8    25.5    77
01-May-17   830 NM7 2   0   5   27.2    24.4    23.4    78
01-May-17   930 NM7 3   0   5   31.2    27.2    25.6    72
01-May-17   1830    NM7 8   0   5   32  27.4    25.6    69
01-May-17   1130    NM7 6   0   5   32.8    28.2    25.2    64
01-May-17   2030    NM7 6   0   4   32  26.8    25.4    76
01-May-17   1330    NM7 10  0   5   33  27.6    25.4    64
01-May-17   1430    NM7 10  0   5   33  27.6    25.2    65

Upvotes: 3

Views: 22366

Answers (2)

iAnas
iAnas

Reputation: 451

This is how I would do. I would merge the Date and Time column with a space between them. Then keep the column name as DATE. The column will look something like this

   DATE    
01-May-07 230 
01-May-07 330 
01-May-07 430

...... Then I would covert the column to datetime object.

df['DATE'] = pd.to_datetime(df['DATE'], formate = '%d-%b-%y %H%M') 

Then set the DATE column as index

df = df.set_index('DATE') 

Then when the dataframe become time series, then in many ways you can extract a portion of your data based on time. For instance if I only want data from January

jan_data = df['2007-Jan'] 

first week of May

may_1st_week = df['2007-May-01':'2007-May-07'] 

And so on.

Upvotes: 3

jezrael
jezrael

Reputation: 862641

I think need convert to_datetime and then compare with month or strftime and %B for month names:

df = pd.DataFrame({'DATE': ['01-May-07', '02-May-07', '31-Oct-17']})
print (df)
        DATE
0  01-May-07
1  02-May-07
2  31-Oct-17

df = df[pd.to_datetime(df['DATE']).dt.month == 5]
df = df[pd.to_datetime(df['DATE']).dt.strftime('%B') == 'May']

print (df)
        DATE
0  01-May-07
1  02-May-07

If need working with datetimes later:

df['DATE'] = pd.to_datetime(df['DATE'])
df = df[df['DATE'].dt.month == 5]
#df = df[df['DATE'].dt.strftime('%B') == 'May']
print (df)
        DATE
0 2007-05-01
1 2007-05-02

EDIT:

If dont need working with datetimes, for me with your data your approach working:

df = df[df['DATE'].str.contains('May')]

Upvotes: 16

Related Questions