Olovram
Olovram

Reputation: 53

Editing Date in Dataframe to show Year in pandas

I've got 2 datapanels loaded as df's in pandas :

df1:

enter image description here

df2:

enter image description here

Both of them have the same Years/Countries, but the first one has the year entries in mm/dd/yy format while the second one has only years. I need to merge them, which shouldn't be too hard, but I don't know how to reformat the date in the first one so it specifies years and not dates. Thanks in advance

Upvotes: 2

Views: 1706

Answers (3)

Siraj S.
Siraj S.

Reputation: 3751

Maybe this can achieve what you wish in least amount of code:

df['Year'] = pd.to_datetime(df['Year']).dt.year - 100

however bear in mind that changing the date to just 'year' will also change the column dtype to int

>>> df.dtypes
Year        int64
Country    object
dtype: object

Upvotes: 1

hd1
hd1

Reputation: 34677

educ['Year'] = educ['Year'].year should get you the year in the data frame, provided the frame contains a date, if not you can convert them using the to_datetime function, as follows:

educ['Year'] = [y - 100 for y in pd.to_datetime(edu['Year'], infer_datetime_format=True).year]

Upvotes: 0

Andy Hayden
Andy Hayden

Reputation: 375685

It looks like you need to splice in the century explicitly (to avoid it being parsed as 20XX):

In [11]: df = pd.DataFrame([["12/31/50", "Argentina"], ["12/31/51", "Argentina"], ["12/31/52", "Argentina"]], columns=["Year", "Country"])

In [12]: df
Out[12]:
       Year    Country
0  12/31/50  Argentina
1  12/31/51  Argentina
2  12/31/52  Argentina

without splicing:

In [13]: pd.to_datetime(df["Year"])
Out[13]:
0   2050-12-31
1   2051-12-31
2   2052-12-31
Name: Year, dtype: datetime64[ns]

In [14]: pd.to_datetime(df["Year"], format="%m/%d/%y")
Out[14]:
0   2050-12-31
1   2051-12-31
2   2052-12-31
Name: Year, dtype: datetime64[ns]

with splicing:

In [15]: df["Year"].apply(lambda s: s[:-2] + "19" + s[-2:])
Out[15]:
0    12/31/1950
1    12/31/1951
2    12/31/1952
Name: Year, dtype: object

In [16]: pd.to_datetime(df["Year"].apply(lambda s: s[:-2] + "19" + s[-2:]), format="%m/%d/%Y")
Out[16]:
0   1950-12-31
1   1951-12-31
2   1952-12-31
Name: Year, dtype: datetime64[ns]

Note: You could try the parsing in the apply (which may be more efficient).


You can pull out the year from a datetime Series with the .dt accessor:

In [21]: df["Year"] = pd.to_datetime(df["Year"].apply(lambda s: s[:-2] + "19" + s[-2:]), format="%m/%d/%Y")

In [22]: df["Year"].dt.year
Out[22]:
0    1950
1    1951
2    1952
Name: Year, dtype: int64

Upvotes: 1

Related Questions