Reputation: 1511
I am struggeling with the following problem. I have two dataframes df1 and df2 and want to compare these by the column transportation select then from the df1 the country and drom the dates defined per country as shown below in the code. If doing this I receive the following error message:
ValueError: Can only compare identically-labeled Series objects
The code looks like the following:
from pandas.tseries.holiday import (
AbstractHolidayCalendar, EasterMonday,
GoodFriday, Holiday, next_monday,
Easter, nearest_workday, Day, USMartinLutherKingJr,
USPresidentsDay, USMemorialDay, USLaborDay,
USThanksgivingDay)
class GermanHoliday(AbstractHolidayCalendar):
rules = [
Holiday('New Years Day', month=1, day=1, observance=next_monday),
GoodFriday,
EasterMonday,
Holiday('Reformation Day', year=2017, month=10, day=31, observance=nearest_workday),
Holiday('Labour Day', month=5, day=1, observance=nearest_workday),
Holiday('Whit Monday', month=1, day=1, offset=[Easter(), Day(50)]),
Holiday('Day of German Unity', month=10, day=3, observance=nearest_workday),
Holiday('Christmas Day', month=12, day=25, observance=nearest_workday),
Holiday('Boxing Day',month=12, day=26, observance=nearest_workday)
]
class USHolidays(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas Day', month=12, day=25, observance=nearest_workday)
]
calendarGermany = GermanHoliday()
calendarUS = USHolidays()
holidaysGermany = calendarGermany .holidays().to_pydatetime()
holidaysUS = calendarUS .holidays().to_pydatetime()
qry = "Transportation in @df1.ticker and Date not in @holidaysGermany "
df2 = df2.query(qry)
And the structure of the data frames df1 and df2 are below:
df1:
0 transportation country
1 ICE Germany
2 Lufthansa Germany
3 SIXT Germany
4 TGV France
5 Air France France
6 Alamo France
7 National USA
8 Amtrak USA
9 Delta USA
df2:
Date transportation price
0 2015-12-21 ICE 81.9924
1 2015-12-22 ICE 81.5173
2 2015-12-23 ICE 83.5015
3 2015-12-24 ICE 83.5015
4 2015-12-25 ICE 83.5015
5 2015-12-28 ICE 83.0357
6 2015-12-29 ICE 84.6286
7 2015-12-30 ICE 83.7250
8 2015-12-31 ICE 83.7250
9 2016-01-01 ICE 83.7250
10 2015-12-21 National 127.3900
11 2015-12-22 National 129.0000
12 2015-12-23 National 131.8800
13 2015-12-24 National 131.8800
14 2015-12-25 National 131.8800
15 2015-12-28 National 130.0300
16 2015-12-29 National 132.1700
...
The final result should look like this:
df2:
Date transportation price
0 2015-12-21 ICE 81.9924
1 2015-12-22 ICE 81.5173
2 2015-12-23 ICE 83.5015
3 2015-12-24 ICE 83.5015
4 2015-12-28 ICE 83.0357
5 2015-12-29 ICE 84.6286
6 2015-12-30 ICE 83.7250
7 2015-12-31 ICE 83.7250
8 2016-01-01 ICE 83.7250
9 2015-12-21 National 127.3900
10 2015-12-22 National 129.0000
11 2015-12-23 National 131.8800
12 2015-12-26 National 131.8800
13 2015-12-28 National 130.0300
14 2015-12-29 National 132.1700
...
Upvotes: 0
Views: 500
Reputation: 210832
IIUC you can do it this way:
In [197]: qry = "transportation in @df1.transportation and \
...: Date not in ['2015-12-24','2015-12-25']"
In [198]: df2.query(qry)
Out[198]:
Date transportation price
0 2015-12-21 ICE 81.9924
1 2015-12-22 ICE 81.5173
2 2015-12-23 ICE 83.5015
5 2015-12-28 ICE 83.0357
6 2015-12-29 ICE 84.6286
7 2015-12-30 ICE 83.7250
8 2015-12-31 ICE 83.7250
9 2016-01-01 ICE 83.7250
10 2015-12-21 National 127.3900
11 2015-12-22 National 129.0000
12 2015-12-23 National 131.8800
15 2015-12-28 National 130.0300
16 2015-12-29 National 132.1700
Upvotes: 1