Sean McCarthy
Sean McCarthy

Reputation: 5558

Filter pandas dataframe on date from another dataframe

Suppose I have the following two tables. Notice the second table extends an additional 15 days into February:

dates1 = pd.DataFrame(data=pd.date_range('2018-1-1 00:00', '2018-1-31 23:59', freq='h'), columns=['date'])
dates2 = pd.DataFrame(data=pd.date_range('2018-1-1 00:00', '2018-2-15 23:59', freq='h'), columns=['date'])

I want to filter the second dataframe based on the most recent date from the first dataframe. Here I find the most recent date from the dates1 table. The result is a timestamp:

most_recent_dates1 = dates1['date'].max()
Timestamp('2018-01-31 23:00:00')

Then I try to filter the second table as follows:

dates3 = dates2[[dates2['date'] <= most_recent_dates1]]

But I get the following error:

ValueError: Item wrong length 1 instead of 1104.

How can I filter a table using one timestamp from another table and not get this error? I know I can do this with an inner join (pandas.merge), but there are times I don't want to use an inner join.

Upvotes: 1

Views: 1389

Answers (2)

Narendra Prasath
Narendra Prasath

Reputation: 1531

you could try the same using apply function

dates2[dates2.apply(lambda x: x['date'] <= most_recent_dates1,axis=1)]

Upvotes: 1

jpp
jpp

Reputation: 164613

Pandas dataframe indexing permits a Boolean series / array as an input. Therefore, you do not need to use a list of a series. You can index directly with a Boolean series:

dates3 = dates2[dates2['date'] <= most_recent_dates1]

As the error points out, you have provided an item of length 1, whereas it expects an item with the same length as your dataframe.

See Boolean indexing from the official docs for more information.

Upvotes: 2

Related Questions