Reputation: 1071
I have the following code that loads a pandas dataframe from a parquet file. The parquet file has a column called the_date
and I'm trying to create a new dataframe filtering by date.
df = pd.read_parquet('path/to/file.parquet')
start_date = datetime.datetime(2010, 1, 1)
df2 = df[df['the_date'] > start_date]
Problem is that df2
has all the df
records and not filtered at all. Is there any type of date conversion needed after loading the dataframe from parquet? What could be wrong?
Upvotes: 0
Views: 310
Reputation: 36
Pandas to_datetime could be used.
Your code would then presumably be:
df = pd.read_parquet('path/to/file.parquet')
start_date = datetime.datetime(2010, 1, 1)
df['the_date'] = pd.to_datetime(df['the_date']) # And any other parameters required to read the date
df2 = df[df['the_date'] > start_date]
A great way also to understand data formats is to use df.info() which should tell you the format
Upvotes: 1