Reputation: 53
I've got 2 datapanels loaded as df's in pandas :
df1:
df2:
Both of them have the same Years/Countries, but the first one has the year entries in mm/dd/yy format while the second one has only years. I need to merge them, which shouldn't be too hard, but I don't know how to reformat the date in the first one so it specifies years and not dates. Thanks in advance
Upvotes: 2
Views: 1706
Reputation: 3751
Maybe this can achieve what you wish in least amount of code:
df['Year'] = pd.to_datetime(df['Year']).dt.year - 100
however bear in mind that changing the date to just 'year' will also change the column dtype to int
>>> df.dtypes
Year int64
Country object
dtype: object
Upvotes: 1
Reputation: 34677
educ['Year'] = educ['Year'].year
should get you the year in the data frame, provided the frame contains a date, if not you can convert them using the to_datetime function, as follows:
educ['Year'] = [y - 100 for y in pd.to_datetime(edu['Year'], infer_datetime_format=True).year]
Upvotes: 0
Reputation: 375685
It looks like you need to splice in the century explicitly (to avoid it being parsed as 20XX):
In [11]: df = pd.DataFrame([["12/31/50", "Argentina"], ["12/31/51", "Argentina"], ["12/31/52", "Argentina"]], columns=["Year", "Country"])
In [12]: df
Out[12]:
Year Country
0 12/31/50 Argentina
1 12/31/51 Argentina
2 12/31/52 Argentina
without splicing:
In [13]: pd.to_datetime(df["Year"])
Out[13]:
0 2050-12-31
1 2051-12-31
2 2052-12-31
Name: Year, dtype: datetime64[ns]
In [14]: pd.to_datetime(df["Year"], format="%m/%d/%y")
Out[14]:
0 2050-12-31
1 2051-12-31
2 2052-12-31
Name: Year, dtype: datetime64[ns]
with splicing:
In [15]: df["Year"].apply(lambda s: s[:-2] + "19" + s[-2:])
Out[15]:
0 12/31/1950
1 12/31/1951
2 12/31/1952
Name: Year, dtype: object
In [16]: pd.to_datetime(df["Year"].apply(lambda s: s[:-2] + "19" + s[-2:]), format="%m/%d/%Y")
Out[16]:
0 1950-12-31
1 1951-12-31
2 1952-12-31
Name: Year, dtype: datetime64[ns]
Note: You could try the parsing in the apply (which may be more efficient).
You can pull out the year from a datetime Series with the .dt accessor:
In [21]: df["Year"] = pd.to_datetime(df["Year"].apply(lambda s: s[:-2] + "19" + s[-2:]), format="%m/%d/%Y")
In [22]: df["Year"].dt.year
Out[22]:
0 1950
1 1951
2 1952
Name: Year, dtype: int64
Upvotes: 1