ps0604
ps0604

Reputation: 1071

Comparing pandas dates after loading dataframe from parquet

I have the following code that loads a pandas dataframe from a parquet file. The parquet file has a column called the_date and I'm trying to create a new dataframe filtering by date.

df = pd.read_parquet('path/to/file.parquet')
start_date = datetime.datetime(2010, 1, 1)
df2 = df[df['the_date'] > start_date]

Problem is that df2 has all the df records and not filtered at all. Is there any type of date conversion needed after loading the dataframe from parquet? What could be wrong?

Upvotes: 0

Views: 310

Answers (1)

Oren Bahari
Oren Bahari

Reputation: 36

Pandas to_datetime could be used.

Your code would then presumably be:

df = pd.read_parquet('path/to/file.parquet')
start_date = datetime.datetime(2010, 1, 1)
df['the_date'] = pd.to_datetime(df['the_date']) # And any other parameters required to read the date
df2 = df[df['the_date'] > start_date]

A great way also to understand data formats is to use df.info() which should tell you the format

Upvotes: 1

Related Questions