Duna
Duna

Reputation: 735

Extracting hour, date and month from a pandas (python) DataFrame object

With a pandas DataFrame df:

         x                   y
0   29-04-2014 07:40:52     07:40:52
1   29-04-2014 07:59:25     07:15:00
2                NaN             NaN
3   29-04-2014 08:16:12     07:45:00
4   09-05-2014 08:19:14     07:30:00
5   23-04-2014 08:27:16     08:30:00
6   29-04-2014 08:37:16     07:00:00
7                NaN             NaN
8   29-04-2014 08:41:16     07:30:00
9   25-03-2014 08:42:16     07:30:00

where columns x contains the Date-Month-Year Hour:Minute:Second and y stores the Hour:Minute:Second, how can one extract only:

  1. the date [29 from row 0] of column x,
  2. the month [04 or April from row 0] of column x,
  3. the month and date [29-04 or 29-April from row 0] of column x,
  4. the hour and minute [07-40 from row 0] of columns x and y

I imported the DataFrame from a text file using

df = pd.read_table("C:\data.txt, sep= '\t'")

but it was originally from MS Excel or MS Access.

When I run df.dtypes, I got

   x    object
   y    object
   dtype: object

I am working of Pandas version 0.14.1 in Python 3.4.

Example DataFrame

import numpy as np
import pandas a pd

df = pd.DataFrame({'x': ['29-04-2014 07:40:52', np.nan, '29-04-2014 08:16:16','29-04-2014 08:19:56', '29-04-2014 08:27:20'],
                   'y': ['07:40:52', '07:15:00', np.nan, '07:45:00', '07:30:00']})

Upvotes: 2

Views: 7161

Answers (1)

JD Long
JD Long

Reputation: 60746

I think the Pandas way is to make x your index, then you can use some simple methods to extract what you want. The non-Pandas way is to use datetime module.

Pandas way... For background you can read the documentation around timeseries data which is quite good.

set up some example data:

n = 10
df = pd.DataFrame(pd.date_range('1/1/2000', periods=n))
df.columns = ['x']
df['z'] = rand(n)
df.set_index('x', inplace=True)
print df

                   z
x                   
2000-01-01  0.863064
2000-01-02  0.980083
2000-01-03  0.278810
2000-01-04  0.960890
2000-01-05  0.309591
2000-01-06  0.662498
2000-01-07  0.802367
2000-01-08  0.403791
2000-01-09  0.981172
2000-01-10  0.342935

Note that it does not appear your data is set with x as the index. That's an important step.

After you have your dates as a timestap you can access what you are after:

df['d'] = df.index.day
df['m'] = df.index.month
df['y'] = df.index.year
print df

                   z     y   d  m
x                                
2000-01-01  0.863064  2000   1  1
2000-01-02  0.980083  2000   2  1
2000-01-03  0.278810  2000   3  1
2000-01-04  0.960890  2000   4  1
2000-01-05  0.309591  2000   5  1
2000-01-06  0.662498  2000   6  1
2000-01-07  0.802367  2000   7  1
2000-01-08  0.403791  2000   8  1
2000-01-09  0.981172  2000   9  1
2000-01-10  0.342935  2000  10  1

Upvotes: 2

Related Questions