Reputation: 16488
I have data of the structure
country year POP
606 Algeria 1966 12339.140
730 Algeria 1968 13146.267
793 Algeria 1969 13528.304
856 Algeria 1970 13931.846
924 Algeria 1971 14335.388
Now I want to create first-differences per country based on the year (difference per year). If it weren't for the interval concern, I'd do something along the lines of
df.sort(['country', 'year']).set_index(['country', 'year']).diff()
Instead, I guess I have to convert year
to_datetime()
first. Is there a simple way to create the datetime from a column that contains years only? And is there a different more natural approach to create the differences over time?
Upvotes: 0
Views: 95
Reputation: 32222
You could just do
df.set_index(df.year.map(lambda x: datetime.datetime(x, 1, 1)))
That uses the concept of left-open intervals.
Another possibility is
df.set_index(df.year.map(pd.Period))
Both return equally well-defined indexes, in the latter case you might like the output of df.diff()
better since it actually states a year.
Upvotes: 1