user17059762
user17059762

Reputation: 25

Replicating output of excel COUNTIFs in pandas

enter image description here

I have dataset with the following column - Source, Sink, StartDate, EndDate, Class, MW, ClearingPrice. I am trying to create a column (Path_Repeat) that takes value 1 if the a particular Source-Sink combinations reverse exists in the dataset for the same StartDate and EndDate.

I was able to do this in excel by using a =COUNTIFS(A:A,B2,B:B,A2,C:C,C2,D:D,D2,E:E,E2). I wondering if there is a way of doing this in pandas that is faster than excel.

Upvotes: 1

Views: 67

Answers (1)

Laurent
Laurent

Reputation: 13458

With the following toy dataframe:

df = pd.DataFrame(
    {
        "Source": ["A", "B", "A", "B"],
        "Sink": ["B", "A", "C", "A"],
        "StartDate": ["1/1/2010", "1/1/2010", "1/1/2010", "2/1/2010"],
        "EndDate": ["31/1/2010", "31/1/2010", "31/1/2010", "31/1/2010"],
    }
)

Here is one way to do it:

# Create a new column for comparison
df["key"] = df.apply(lambda x: x["Source"] + x["Sink"], axis=1)
df["key"] = df["key"].apply(lambda x: "".join(sorted(x)))

# Search for duplicates
df.loc[
    df.duplicated(subset=["StartDate", "EndDate", "key"], keep=False), "Path_Repeat"
] = 1

# Cleanup
df = df.fillna(0).astype({"Path_Repeat": int})
  Source Sink StartDate    EndDate key  Path_Repeat
0      A    B  1/1/2010  31/1/2010  AB            1
1      B    A  1/1/2010  31/1/2010  AB            1
2      A    C  1/1/2010  31/1/2010  AC            0
3      B    A  2/1/2010  31/1/2010  AB            0

Upvotes: 1

Related Questions