Reputation: 16488
Stata has several date formats, running day, running datetime, number of months since epoch (...).
Pandas can automatically convert these to datetime when importing a .dta file. Stata 18 now provide support to directly load data from a Stata session into pandas using the command pystata.stata.pdataframe_from_data. This command allows to keep value labels when importing, but it apparently does not provide any support for date variables.
As a result, I have a pandas.dataframe where the month is the running month since epoch (%td in Stata format). Is there a convenient way to convert this (and any other Stata-originated date format) to datetime?
Upvotes: 0
Views: 121
Reputation: 87
Stata has a starting date for its daily date (%td) format of January 1, 1960. This is different from the standard Unix epoch used in Python (January 1, 1970). When Stata data is imported into pandas, numerical date formats won't automatically convert to pandas datetime objects and will need some transformation.
import pandas as pd
from datetime import timedelta
# Assuming 'date_column' is your column with Stata's %td format
df['date_column'] = pd.to_datetime('1960-01-01') + pd.to_timedelta(df['date_column'], unit='D')
# Assume 'month_column' has the running month since January 1960 (%tm)
# 0 would correspond to January 1960, 1 to February 1960, etc.
df['month_column'] = pd.to_datetime('1960-01-01') + pd.to_timedelta(df['month_column'], unit='M')
df['month_column'] = df['month_column'].apply(lambda x: pd.to_datetime('1960-01-01') + pd.DateOffset(months=x))
Upvotes: 1