How to define order for pipe delimited values in python

Question

I have a column with pipe delimited values and i want to make sure it always follows a certain order referring another column.

for e.g column A should always follow pattern defined in B.

What is the optimal way to achieve this using python ? I have already implemented the len comparison but looking for a granular method.

The values in B column always remain the same but column A is a 10 split field with only some fields as constant and other changing values for e.g C-code can be TLC,BMN,JVL and C-name can be Amla Life,Hidden N,K Regular,Sauce P etc.

Azhar Khan · Accepted Answer

Define a dictionary of all fixed lists:

category_dict = {
  "C-code": ["TLC", "BMN", "JVL"],
  "C-name": ["Amla Life", "Hidden N", "K Regular", "Sauce P"],
}

Sample dataset:

data = [
  ("TLC|Amla Life", "C-code|C-name"), #correct
  ("Amla Life|TLC", "C-name|C-code"), #correct (but different order)
  ("TLC|Amla Life", "C-name|C-code"), #incorrect order
  ("XYZ|Amla Life", "C-code|C-name"), #invalid value
  ("TLC", "C-code|C-name"), #incorrect number of values
]

df = pd.DataFrame(data, columns=["A", "B"])

Define a function to split the A, B by "|", iterate over respective indices and verify if value in A is in dictionary by B as key:

def isin_order(row):
  a_split = row["A"].split("|")
  b_split = row["B"].split("|")
  
  #incorrect number of values
  if len(a_split) != len(b_split):
    return False
  
  for a,b in zip(a_split, b_split):
    if b not in category_dict or a not in category_dict[b]:
      return False
  
  return True

Apply it to the dataframe:

df["result"] = df.apply(lambda row: isin_order(row), axis=1)

>>                A              B  result
>> 0  TLC|Amla Life  C-code|C-name    True
>> 1  Amla Life|TLC  C-name|C-code    True
>> 2  TLC|Amla Life  C-name|C-code   False
>> 3  XYZ|Amla Life  C-code|C-name   False
>> 4            TLC  C-code|C-name   False

How to define order for pipe delimited values in python

Answers (1)

Related Questions