Reputation: 25
I have dataset with the following column - Source, Sink, StartDate, EndDate, Class, MW, ClearingPrice. I am trying to create a column (Path_Repeat) that takes value 1 if the a particular Source-Sink combinations reverse exists in the dataset for the same StartDate and EndDate.
I was able to do this in excel by using a =COUNTIFS(A:A,B2,B:B,A2,C:C,C2,D:D,D2,E:E,E2). I wondering if there is a way of doing this in pandas that is faster than excel.
Upvotes: 1
Views: 67
Reputation: 13458
With the following toy dataframe:
df = pd.DataFrame(
{
"Source": ["A", "B", "A", "B"],
"Sink": ["B", "A", "C", "A"],
"StartDate": ["1/1/2010", "1/1/2010", "1/1/2010", "2/1/2010"],
"EndDate": ["31/1/2010", "31/1/2010", "31/1/2010", "31/1/2010"],
}
)
Here is one way to do it:
# Create a new column for comparison
df["key"] = df.apply(lambda x: x["Source"] + x["Sink"], axis=1)
df["key"] = df["key"].apply(lambda x: "".join(sorted(x)))
# Search for duplicates
df.loc[
df.duplicated(subset=["StartDate", "EndDate", "key"], keep=False), "Path_Repeat"
] = 1
# Cleanup
df = df.fillna(0).astype({"Path_Repeat": int})
Source Sink StartDate EndDate key Path_Repeat
0 A B 1/1/2010 31/1/2010 AB 1
1 B A 1/1/2010 31/1/2010 AB 1
2 A C 1/1/2010 31/1/2010 AC 0
3 B A 2/1/2010 31/1/2010 AB 0
Upvotes: 1