Removing observations that have the same column values for a non unique id

Question

I have a dataframe that has "tag information" on different companies for both iPad and Tablet platforms. Each "experiment" has an id which can occur multiple times depending on how many tags the experiment has. Experiments can be on iPad or Tablet (type), but i want to remove all of the duplicate experiments (the same experiment that appears in both iPad and Tablet). An experiment is a duplicate if it's from the same company and has the exact same tags. For example in the following dataframe Netflix is a duplicate because it has the same tags (Includes dropdown, Includes product list) for both iPad and Tablet. So either the tablet version or iPad version should be removed.

Input:

id  company   type       tag
1   Netflix   iPad       Includes dropdown
1   Netflix   iPad       Includes product list
2   Netflix   Tablet     Includes dropdown
2   Netflix   Tablet     Includes product list
3   Apple     iPad       Includes images
4   Apple     Tablet     Includes images

Output:

id  company   type       tag
2   Netflix   Tablet     Includes dropdown
2   Netflix   Tablet     Includes product list
3   Apple     iPad       Includes images
4   Apple     Tablet     Includes images

I'm looking for a solution in pandas python. How can i do this?

I've tried this

df.drop_duplicates(subset=['tag'], keep='last')

But i dont think solution works beacuse theres a possibility that there might be another experiment that is a different company but it contains the same tags. Therefore it will delete this instance even though it's not considered a duplicate.

Basically i want to drop ids that have the same tag for the same company.

Removing observations that have the same column values for a non unique id

Answers (1)

Related Questions