Pandas : select columns that ends with another column

Question

I am working on a dirty dataset where two columnns that I need to match are not properly formatted :

"id" is a string often made of digits, that potentially starts with zero
"parent_id" represents the id of the parent of the row, but it has been formatted to an int, and so the starting zeros are gone.

I want to find for which rows "id" is the same as "parent_id". However, I cannot manage to match it like this :

df["is_the_same"] = (df["id"]==df["parent_id"])

cause some of them would not match (for example, the id "01004" has "1004" as parent_id, and would not match in this case)

How can I select columns that have "id" equal to "parent_id" once potential zeroes have been removed ?

I also tried :

df["is_the_same"] = df["id"].str.endswith(df["parent_id"])

But it seems .str.endswith only work with constant strings (another column)

jezrael · Accepted Answer

Use list comprehension with endswith:

df = pd.DataFrame({'id':['01004','1004','54620'], 'parent_id':['1004','203','20']})

df["is_the_same"] = [x.endswith(y) for x, y in df[["id","parent_id"]].values]
#alternative
#df["is_the_same"] = df.apply(lambda x: x["id"].endswith(x["parent_id"]), axis=1)
print (df)
      id parent_id  is_the_same
0  01004      1004         True
1   1004       203        False
2  54620        20         True

If difference only leading zeros and numbers compare converted values to integers:

df["is_the_same"] = df["id"].astype(int) == df["parent_id"].astype(int)
print (df)
      id parent_id  is_the_same
0  01004      1004         True
1   1004       203        False
2  54620        20        False

Pandas : select columns that ends with another column

Answers (2)

Related Questions