Mofongo
Mofongo

Reputation: 131

In Pandas, .loc is not returning the specified rows

I was trying to get the days where Hours < 100 but it wouldn't work.

temp_df = temp_df.loc[df['Hours'] < 100]

returns:

+---+------------+----------+----------+------------+
|   |    Date    | Activity | Weekday  |   Hours    |
+---+------------+----------+----------+------------+
| 0 | 2020-06-01 | Gen Ab   | Monday   | 347.250000 |
| 1 | 2020-06-02 | Gen Ab   | Tuesday  | 286.266667 |
| 4 | 2020-06-05 | Gen Ab   | Friday   | 317.566667 |
| 5 | 2020-06-06 | Gen Ab   | Saturday | 42.500000  |
| 6 | 2020-06-07 | Gen Ab   | Sunday   | 8.500000   |
+---+------------+----------+----------+------------+

I should add that the rows it does filter out have 'Hours' well over 100 as well.

The code below works, but where am I going wrong with .loc?

temp_df = temp_df.query('Hours <= 100')[['Date', 'Activity', 'Weekday', 'Hours']].reset_index(drop=True)

After using .query I tried again, but this time trying to remove "Saturday" and "Sunday"

temp_df.loc[~df['Weekday'].isin(['Saturday', 'Sunday'])]

I should be getting the values that are not "Saturday" or "Sunday" in the weekday column, right?

It's returning

+---+------------+----------+----------+-----------+
|   |    Date    | Activity | Weekday  |   Hours   |
+---+------------+----------+----------+-----------+
| 0 | 2020-06-06 | Gen Ab   | Saturday | 42.500000 |
| 1 | 2020-06-07 | Gen Ab   | Sunday   | 8.500000  |
| 2 | 2020-06-13 | Gen Ab   | Saturday | 24.183333 |
| 3 | 2020-06-20 | Gen Ab   | Saturday | 34.000000 |
| 4 | 2020-06-27 | Gen Ab   | Saturday | 25.500000 |
| 5 | 2020-07-03 | Gen Ab   | Friday   | 33.083333 |
| 6 | 2020-07-04 | Gen Ab   | Saturday | 18.500000 |
| 7 | 2020-07-11 | Gen Ab   | Saturday | 22.550000 |
+---+------------+----------+----------+-----------+

Upvotes: 1

Views: 929

Answers (2)

Michael Butscher
Michael Butscher

Reputation: 10959

It is important to check that the dataframe (or series) used to write the selection criterion is the same or at least equal to the frame on which the resulting selection (the boolean array) is applied.

In the case here with

temp_df.loc[df['Hours'] < 100]

temp_df and df weren't equal.

The easiest way to ensure equality is just to use the same variable (and therefore same dataframe) for criterion and applying of selection, e. g:

df.loc[df['Hours'] < 100]

Upvotes: 1

Mofongo
Mofongo

Reputation: 131

To all you noobs like me here's the answer thanks to @Michael Butscher

If "temp_df" and "df" aren't the same initially, this may fail. Better: temp_df = df.loc[df ... 

Didn't realize they weren't the same. I thought they were but went back and checked, they were not.

Upvotes: 0

Related Questions