Python Pandas - df.loc - adding additional criteria not working

Question

I have a script with the following code that is working as intended

Table2 = df.loc[df.Date.between('2018-11-22','2018-11-30')].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum()

However, now i am trying to add an additional criterion in order to filter on a column called "region" which has the value Canada in the dataset but it doesn't seem to be working.

Table2 = df.loc[df.Date.between('2018-11-22','2018-11-30'), df['Region'] = 'Canada'].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum()

The additional filter seems to have no impact. Can anyone help. Thanks

Asif Ali · Accepted Answer

As mentioned by @Alaxander

Here's an example snippet:

import pandas

df = pd.DataFrame({
    "A": [1,2,3],
    "B": [4,5,6],
    "C": [1,1,1]
})

df.loc[((d["A"]==1) & (d["B"]==4)), ["A", "B"]]

Also, you might want to look at the assignment operator used in df["Region"] = "Canada", shouldn't it be == for it to be used as a filter? I have added this as well in your code below.

Your code if you want specific fields:

Table2 = df.loc[((df.Date.between('2018-11-22','2018-11-30')) & (df['Region'] == 'Canada')), ["Date", "Region"]].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum()

Your code if you want all the fields:

Table2 = df.loc[((df.Date.between('2018-11-22','2018-11-30')) & (df['Region'] == 'Canada'))].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum()

PS: Thanks to @Alexandar for mentioning about the mistake.

Python Pandas - df.loc - adding additional criteria not working

Answers (2)

Related Questions