pandas: check if the same id has the same value in a dataset

Question

I have a dataset like this:

  customer_id customer_name order_id
1 0000A1       CompanyA     7e2e3978
2 0000A1       CompanyA     7e2e3de2
3 0000A1       CompanyA     7e2e3efa
4 0000B1       CompanyB     7e2e3fc2
5 0000B1       CompanyA     7e2e408a
6 0000B1       CompanyB     7e2e4148
7 0000C1       CompanC      7e2e4206
8 0000C1       CompanyC     7e2e42c4
9 0000C1       CompanyC     7e2e4512

The dataset is sorted using the customer_id. There are a number of ids(customer_id) and values(customer_name) that is supposed to be corresponding to each other (the same id should have the same value). But there are some rows with incorrect data (row 5 and 7 in this case). I want to use pandas to find out these rows.

Now I'm writing my code using some if else loop:

xlsx = pandas.ExcelFile('order-table.xlsx')
df = pandas.read_excel(xlsx, 'Sheet1')
previous_id = "0000A1"
previous_value = "CompanyA"
for (idx, row) in df.iterrows():
    current_id = row.loc['customer_id']
    current_value = row.loc['customer_name']
    if current_id == previous_id:
        if current_value == previous_value:
            df.loc[idx, "same"] = "true"
        else:
            df.loc[idx, "same"] = "false"
    else:
        previous_id = current_id
        previous_value = current_value

df.to_excel("order-table-marked.xlsx")

This can generate a column that can mark out the rows with incorrect data. But I assume it's not the best approach. Is there a better way of doing this in pandas? Is it faster to do this using groupby() or drop_duplicate() and how to do it?

pandas: check if the same id has the same value in a dataset

Answers (1)

Related Questions