Daven1
Daven1

Reputation: 135

Flag difference in panda dataframe

I have pandas dataset and want to create a column that would flag the difference

i.e Column B should have the same values for each value in column A and vice versa. If it's not then flag it as 1

column A Column B New Column
Atlanta GA 0
Atlanta GA 0
Newyork NY 1
Newyork YN 1
company1 Com 1
company Com 1
company Com 1

Upvotes: 0

Views: 113

Answers (2)

jumzz
jumzz

Reputation: 16

If you care about the order and repetition at each character in column B, you can get the similarity for each word in B to A.

def sim_lower(A, B):
        return ''.join([ch for ch in B.lower() if ch in A.lower()])

df['Flag'] = [sim_lower(A,B) == B.lower() for A,B in zip(df['column A'],df['column B'])]

Which returns

   column A column B  New Column   Flag
0   Atlanta       GA           0  False
1   Atlanta       GA           0  False
2   Newyork       NY           1   True
3   Newyork       YN           1   True
4  company1      Com           1   True
5   company      Com           1   True
6   company      Com           1   True

Upvotes: 0

Odhian
Odhian

Reputation: 375

Since the question is updated, here is a way of doing it. I use this data :

df = pd.DataFrame({"column A": ["Atlanta", "Atlanta", "New York", "New York"], "column B": ["AT", "AT", "YN", "NY"]})
df
    column A    column B
0   Atlanta     AT
1   Atlanta     AT
2   New York    YN
3   New York    NY

With pd.groupby :

df_gb = df.groupby("column A", as_index=False).nunique()

condition = [df_gb["column B"] == 1]
value = [0]
df_gb["difference"] = np.select(condition, value, default=1)
df_gb = df_gb[["column A", "difference"]]

Output[0] :

df_gb

    column A    difference
0   Atlanta     0
1   New York    1

Then finally :

df = df.merge(df_gb, on="column A", how="left")

Output[1] :

df

    column A    column B    difference
0   Atlanta     AT          0
1   Atlanta     AT          0
2   New York    YN          1
3   New York    NY          1

Upvotes: 1

Related Questions