Reputation: 23
I need help comparing values in a pandas Dataframe which are indexed differently. I've read the Dataframe from a csv containing headers 'Time', 'Predicted', 'Engine'. 'Time' is a timeseries "DD.MM.YYYY hh:mm:ss" in 10 minute steps , 'Predicted' and 'Engine' take values 0 or 1. So it looks like this:
+--------------------+---------+---------+
|Time |Predicted|Engine |
|01.01.2019 00:00:00| 0| 0|
|01.01.2019 00:10:00| 1| 0|
|01.01.2019 00:20:00| 1| 1|
| ...| ...| ...|
I want to compare the Predicted value at [i] with the Engine value at [i+1].
+--------------------+---------+---------+------+
|Time |Predicted|Engine |Result|
|01.01.2019 00:00:00| 0| 0|False | <- although prob. not defined ?
|01.01.2019 00:10:00| 1| 0|True |
|01.01.2019 00:20:00| 1| 1|True |
| ...| ...| ...| ...|
This was my initial code (to clarify what I was aiming for), which resulted in
ValueError: Can only compare identically-labeled Series objects
Code:
res = []
for i in df['Predicted']:
if df['Predicted'][i:i+1] == df['Engine'][i+1:i+2]:
res.append(True)
else:
res.append(False)
df['Result'] = res
I now get why this isn't working but I can't find a solution to this problem on my own (yet) as I am fairly new to programming.
Upvotes: 2
Views: 50
Reputation: 4618
You can use shift, it basically shift your series by some amount and then compare this with the Engine:
df['Result'] = df['Predicted'].shift(1) == df['Engine']
Upvotes: 3