Reputation: 5558
Suppose I have the following two tables. Notice the second table extends an additional 15 days into February:
dates1 = pd.DataFrame(data=pd.date_range('2018-1-1 00:00', '2018-1-31 23:59', freq='h'), columns=['date'])
dates2 = pd.DataFrame(data=pd.date_range('2018-1-1 00:00', '2018-2-15 23:59', freq='h'), columns=['date'])
I want to filter the second dataframe based on the most recent date from the first dataframe. Here I find the most recent date from the dates1 table. The result is a timestamp:
most_recent_dates1 = dates1['date'].max()
Timestamp('2018-01-31 23:00:00')
Then I try to filter the second table as follows:
dates3 = dates2[[dates2['date'] <= most_recent_dates1]]
But I get the following error:
ValueError: Item wrong length 1 instead of 1104.
How can I filter a table using one timestamp from another table and not get this error? I know I can do this with an inner join (pandas.merge), but there are times I don't want to use an inner join.
Upvotes: 1
Views: 1389
Reputation: 1531
you could try the same using apply function
dates2[dates2.apply(lambda x: x['date'] <= most_recent_dates1,axis=1)]
Upvotes: 1
Reputation: 164613
Pandas dataframe indexing permits a Boolean series / array as an input. Therefore, you do not need to use a list of a series. You can index directly with a Boolean series:
dates3 = dates2[dates2['date'] <= most_recent_dates1]
As the error points out, you have provided an item of length 1, whereas it expects an item with the same length as your dataframe.
See Boolean indexing from the official docs for more information.
Upvotes: 2