MCM
MCM

Reputation: 1511

Compare two different pandas dataframes and drop rows Python

I am struggeling with the following problem. I have two dataframes df1 and df2 and want to compare these by the column transportation select then from the df1 the country and drom the dates defined per country as shown below in the code. If doing this I receive the following error message:

ValueError: Can only compare identically-labeled Series objects

The code looks like the following:

from pandas.tseries.holiday import (
AbstractHolidayCalendar, EasterMonday,
GoodFriday, Holiday, next_monday,
Easter, nearest_workday, Day, USMartinLutherKingJr,
USPresidentsDay, USMemorialDay, USLaborDay,
USThanksgivingDay)

class GermanHoliday(AbstractHolidayCalendar):
    rules = [
             Holiday('New Years Day', month=1, day=1, observance=next_monday),
             GoodFriday,
             EasterMonday,
             Holiday('Reformation Day', year=2017, month=10, day=31, observance=nearest_workday),
             Holiday('Labour Day', month=5, day=1, observance=nearest_workday),
             Holiday('Whit Monday', month=1, day=1, offset=[Easter(), Day(50)]),
             Holiday('Day of German Unity', month=10, day=3, observance=nearest_workday),
             Holiday('Christmas Day', month=12, day=25, observance=nearest_workday),
             Holiday('Boxing Day',month=12, day=26, observance=nearest_workday) 
    ]

class USHolidays(AbstractHolidayCalendar):
    rules = [
             Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
             USMartinLutherKingJr,
             USPresidentsDay,
             GoodFriday,
             USMemorialDay,
             Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
             USLaborDay,
             USThanksgivingDay,
             Holiday('Christmas Day', month=12, day=25, observance=nearest_workday)
             ]

calendarGermany = GermanHoliday()
calendarUS = USHolidays()

holidaysGermany = calendarGermany .holidays().to_pydatetime()
holidaysUS = calendarUS .holidays().to_pydatetime()

qry = "Transportation in @df1.ticker and Date not in @holidaysGermany "

df2 = df2.query(qry)

And the structure of the data frames df1 and df2 are below:

df1:

0    transportation  country
1    ICE             Germany
2    Lufthansa       Germany
3    SIXT            Germany
4    TGV             France
5    Air France      France
6    Alamo           France
7    National        USA
8    Amtrak          USA
9    Delta           USA

df2:

   Date         transportation price
0  2015-12-21   ICE            81.9924
1  2015-12-22   ICE            81.5173
2  2015-12-23   ICE            83.5015
3  2015-12-24   ICE            83.5015
4  2015-12-25   ICE            83.5015
5  2015-12-28   ICE            83.0357
6  2015-12-29   ICE            84.6286
7  2015-12-30   ICE            83.7250
8  2015-12-31   ICE            83.7250
9  2016-01-01   ICE            83.7250
10 2015-12-21   National       127.3900
11 2015-12-22   National       129.0000
12 2015-12-23   National       131.8800
13 2015-12-24   National       131.8800
14 2015-12-25   National       131.8800
15 2015-12-28   National       130.0300
16 2015-12-29   National       132.1700
...

The final result should look like this:

df2:

   Date         transportation price
0  2015-12-21   ICE            81.9924
1  2015-12-22   ICE            81.5173
2  2015-12-23   ICE            83.5015
3  2015-12-24   ICE            83.5015
4  2015-12-28   ICE            83.0357
5  2015-12-29   ICE            84.6286
6  2015-12-30   ICE            83.7250
7  2015-12-31   ICE            83.7250
8  2016-01-01   ICE            83.7250
9  2015-12-21   National       127.3900
10 2015-12-22   National       129.0000
11 2015-12-23   National       131.8800
12 2015-12-26   National       131.8800
13 2015-12-28   National       130.0300
14 2015-12-29   National       132.1700
...

Upvotes: 0

Views: 500

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

IIUC you can do it this way:

In [197]: qry = "transportation in @df1.transportation and \
     ...:        Date not in ['2015-12-24','2015-12-25']"

In [198]: df2.query(qry)
Out[198]:
         Date transportation     price
0  2015-12-21            ICE   81.9924
1  2015-12-22            ICE   81.5173
2  2015-12-23            ICE   83.5015
5  2015-12-28            ICE   83.0357
6  2015-12-29            ICE   84.6286
7  2015-12-30            ICE   83.7250
8  2015-12-31            ICE   83.7250
9  2016-01-01            ICE   83.7250
10 2015-12-21       National  127.3900
11 2015-12-22       National  129.0000
12 2015-12-23       National  131.8800
15 2015-12-28       National  130.0300
16 2015-12-29       National  132.1700

Upvotes: 1

Related Questions