Reputation: 135
I have pandas dataset and want to create a column that would flag the difference
i.e Column B should have the same values for each value in column A and vice versa. If it's not then flag it as 1
column A | Column B | New Column |
---|---|---|
Atlanta | GA | 0 |
Atlanta | GA | 0 |
Newyork | NY | 1 |
Newyork | YN | 1 |
company1 | Com | 1 |
company | Com | 1 |
company | Com | 1 |
Upvotes: 0
Views: 113
Reputation: 16
If you care about the order and repetition at each character in column B, you can get the similarity for each word in B to A.
def sim_lower(A, B):
return ''.join([ch for ch in B.lower() if ch in A.lower()])
df['Flag'] = [sim_lower(A,B) == B.lower() for A,B in zip(df['column A'],df['column B'])]
Which returns
column A column B New Column Flag
0 Atlanta GA 0 False
1 Atlanta GA 0 False
2 Newyork NY 1 True
3 Newyork YN 1 True
4 company1 Com 1 True
5 company Com 1 True
6 company Com 1 True
Upvotes: 0
Reputation: 375
Since the question is updated, here is a way of doing it. I use this data :
df = pd.DataFrame({"column A": ["Atlanta", "Atlanta", "New York", "New York"], "column B": ["AT", "AT", "YN", "NY"]})
df
column A column B
0 Atlanta AT
1 Atlanta AT
2 New York YN
3 New York NY
With pd.groupby :
df_gb = df.groupby("column A", as_index=False).nunique()
condition = [df_gb["column B"] == 1]
value = [0]
df_gb["difference"] = np.select(condition, value, default=1)
df_gb = df_gb[["column A", "difference"]]
Output[0] :
df_gb
column A difference
0 Atlanta 0
1 New York 1
Then finally :
df = df.merge(df_gb, on="column A", how="left")
Output[1] :
df
column A column B difference
0 Atlanta AT 0
1 Atlanta AT 0
2 New York YN 1
3 New York NY 1
Upvotes: 1