bshah
bshah

Reputation: 167

Year range to date time format

Currently I have a series of string as a column in pandas dataframe which represents a particular year in a "yyyy-yyyy" format for example "2004-2005" is a single string value in this column.

I wanted to know if there is anyway to convert this from string to something similar to datetime format.

The purpose for this is to calculate the difference between the values of this column and other similar column in "Years". For example something similar to below:

col 1        col2        Answer(Total years)
2004-2005    2006-2007    3

Note: One of the ways I thought of doing was to make a dictionary mapping each year to a unique integer value and then calculate the difference between them.

Although I was wondering if there is any simpler way of doing it.

Upvotes: 0

Views: 767

Answers (3)

Andy Hayden
Andy Hayden

Reputation: 375535

It looks like you subtracting the last year in column 2 with the first year in column 1. In which case I'd use str.extract (and convert the result to a number):

In [11]: pd.to_numeric(df['col 1'].str.extract('(\d{4})'))
Out[11]:
0    2004
Name: col 1, dtype: int64

In [12]: pd.to_numeric(df['col2'].str.extract('-(\d{4})')) - pd.to_numeric(df['col 1'].str.extract('(\d{4})'))
Out[12]:
0    3
dtype: int64

Upvotes: 1

repzero
repzero

Reputation: 8412

If you are trying to find the difference between the lowest year and the highest year, here is a go at it

col1="2004-2005"
col2="2006-2007"
col1=col1.split("-") # make a list of the years in col1 ['2004', '2005']
col2=col2.split("-") # make a list of the years in col2 ['2006', '2007']
biglist=col1+col2 #add the two list
biglist.sort() #sort the list from lowest year to highest year
Answer=int(biglist[len(biglist)-1])-int(biglist[0]) #find the difference between lowest and highest year

Upvotes: 0

Batman
Batman

Reputation: 8927

What do you mean by "something similar to a datetime object." Datetimes aren't designed to represent date ranges.

If you want to create a pair of datetime objects you could do something like this:

[datetime.datetime.strptime(x, '%Y') for x in '2005-2006'.split('-')]

Alternatively you could try using a Pandas date_range object if that's closer to what you want.

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.date_range.html

Upvotes: 0

Related Questions