Reputation: 177
I have a column with pipe delimited values and i want to make sure it always follows a certain order referring another column.
for e.g column A should always follow pattern defined in B.
What is the optimal way to achieve this using python ? I have already implemented the len comparison but looking for a granular method.
The values in B column always remain the same but column A is a 10 split field with only some fields as constant and other changing values for e.g C-code can be TLC,BMN,JVL and C-name can be Amla Life,Hidden N,K Regular,Sauce P etc.
Upvotes: 0
Views: 49
Reputation: 4108
Define a dictionary of all fixed lists:
category_dict = {
"C-code": ["TLC", "BMN", "JVL"],
"C-name": ["Amla Life", "Hidden N", "K Regular", "Sauce P"],
}
Sample dataset:
data = [
("TLC|Amla Life", "C-code|C-name"), #correct
("Amla Life|TLC", "C-name|C-code"), #correct (but different order)
("TLC|Amla Life", "C-name|C-code"), #incorrect order
("XYZ|Amla Life", "C-code|C-name"), #invalid value
("TLC", "C-code|C-name"), #incorrect number of values
]
df = pd.DataFrame(data, columns=["A", "B"])
Define a function to split the A, B by "|", iterate over respective indices and verify if value in A is in dictionary by B as key:
def isin_order(row):
a_split = row["A"].split("|")
b_split = row["B"].split("|")
#incorrect number of values
if len(a_split) != len(b_split):
return False
for a,b in zip(a_split, b_split):
if b not in category_dict or a not in category_dict[b]:
return False
return True
Apply it to the dataframe:
df["result"] = df.apply(lambda row: isin_order(row), axis=1)
>> A B result
>> 0 TLC|Amla Life C-code|C-name True
>> 1 Amla Life|TLC C-name|C-code True
>> 2 TLC|Amla Life C-name|C-code False
>> 3 XYZ|Amla Life C-code|C-name False
>> 4 TLC C-code|C-name False
Upvotes: 1