Reputation: 131
I was trying to get the days where Hours < 100 but it wouldn't work.
temp_df = temp_df.loc[df['Hours'] < 100]
returns:
+---+------------+----------+----------+------------+
| | Date | Activity | Weekday | Hours |
+---+------------+----------+----------+------------+
| 0 | 2020-06-01 | Gen Ab | Monday | 347.250000 |
| 1 | 2020-06-02 | Gen Ab | Tuesday | 286.266667 |
| 4 | 2020-06-05 | Gen Ab | Friday | 317.566667 |
| 5 | 2020-06-06 | Gen Ab | Saturday | 42.500000 |
| 6 | 2020-06-07 | Gen Ab | Sunday | 8.500000 |
+---+------------+----------+----------+------------+
I should add that the rows it does filter out have 'Hours' well over 100 as well.
The code below works, but where am I going wrong with .loc?
temp_df = temp_df.query('Hours <= 100')[['Date', 'Activity', 'Weekday', 'Hours']].reset_index(drop=True)
After using .query I tried again, but this time trying to remove "Saturday" and "Sunday"
temp_df.loc[~df['Weekday'].isin(['Saturday', 'Sunday'])]
I should be getting the values that are not "Saturday" or "Sunday" in the weekday column, right?
It's returning
+---+------------+----------+----------+-----------+
| | Date | Activity | Weekday | Hours |
+---+------------+----------+----------+-----------+
| 0 | 2020-06-06 | Gen Ab | Saturday | 42.500000 |
| 1 | 2020-06-07 | Gen Ab | Sunday | 8.500000 |
| 2 | 2020-06-13 | Gen Ab | Saturday | 24.183333 |
| 3 | 2020-06-20 | Gen Ab | Saturday | 34.000000 |
| 4 | 2020-06-27 | Gen Ab | Saturday | 25.500000 |
| 5 | 2020-07-03 | Gen Ab | Friday | 33.083333 |
| 6 | 2020-07-04 | Gen Ab | Saturday | 18.500000 |
| 7 | 2020-07-11 | Gen Ab | Saturday | 22.550000 |
+---+------------+----------+----------+-----------+
Upvotes: 1
Views: 929
Reputation: 10959
It is important to check that the dataframe (or series) used to write the selection criterion is the same or at least equal to the frame on which the resulting selection (the boolean array) is applied.
In the case here with
temp_df.loc[df['Hours'] < 100]
temp_df
and df
weren't equal.
The easiest way to ensure equality is just to use the same variable (and therefore same dataframe) for criterion and applying of selection, e. g:
df.loc[df['Hours'] < 100]
Upvotes: 1
Reputation: 131
To all you noobs like me here's the answer thanks to @Michael Butscher
If "temp_df" and "df" aren't the same initially, this may fail. Better: temp_df = df.loc[df ...
Didn't realize they weren't the same. I thought they were but went back and checked, they were not.
Upvotes: 0