WNG
WNG

Reputation: 3805

Pandas : select columns that ends with another column

I am working on a dirty dataset where two columnns that I need to match are not properly formatted :

I want to find for which rows "id" is the same as "parent_id". However, I cannot manage to match it like this :

df["is_the_same"] = (df["id"]==df["parent_id"])  

cause some of them would not match (for example, the id "01004" has "1004" as parent_id, and would not match in this case)

How can I select columns that have "id" equal to "parent_id" once potential zeroes have been removed ?

I also tried :

df["is_the_same"] = df["id"].str.endswith(df["parent_id"])

But it seems .str.endswith only work with constant strings (another column)

Upvotes: 2

Views: 1170

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34076

Using Series.lstrip, you can strip off leading zeros from id column and then match it with parent_id column, like this:

df["id"].str.lstrip('0') == df["parent_id"]

Assume this to be df:

In [68]: df 
Out[68]: 
      id parent_id
0  01004      1004
1   1004      1004
2    546       100

In [70]: df["is_the_same"] = df["id"].str.lstrip('0') == df["parent_id"]        
In [71]: df    
Out[71]: 
      id parent_id  is_the_same
0  01004      1004         True
1   1004      1004         True
2    546       100        False

Upvotes: 1

jezrael
jezrael

Reputation: 862841

Use list comprehension with endswith:

df = pd.DataFrame({'id':['01004','1004','54620'], 'parent_id':['1004','203','20']})

df["is_the_same"] = [x.endswith(y) for x, y in df[["id","parent_id"]].values]
#alternative
#df["is_the_same"] = df.apply(lambda x: x["id"].endswith(x["parent_id"]), axis=1)
print (df)
      id parent_id  is_the_same
0  01004      1004         True
1   1004       203        False
2  54620        20         True

If difference only leading zeros and numbers compare converted values to integers:

df["is_the_same"] = df["id"].astype(int) == df["parent_id"].astype(int)
print (df)
      id parent_id  is_the_same
0  01004      1004         True
1   1004       203        False
2  54620        20        False

Upvotes: 2

Related Questions