Check whether an item appears previously in a list of items using Python Pandas

Question

I have the following table which has IDs and for each ID i have Basket 1, Basket 2 and finally the products in the baskets.

I want to find on the sorted table (sort on ID from A-Z, Basket 1 ASC and Basket 2 ASC) whether each product for every ID appears previously (for the same ID) and put Y or N in a new column (column E).

Example is below for the first ID:

111AAA222 = [Product1, Product2, Product3, Product1, Product2, Product3, Product1, Product4]

111AAA222 = [N, N, N, Y, Y, Y, Y, N]

The example shows that the Products from the 3rd till the 6th position in the list appeared previously in the same list.

Im having difficulties to create the new column (Column E: Existed) with the values Y and N using python pandas.

Ben.T · Accepted Answer

Create a df_new where data is sorted and you drop duplicates over the two columns 'ID' and 'Product'. Then create the column Existed filled with N as the row in df_new are not the duplicates

df_new = df.sort_values(['ID','Basket 1','Basket 2'],0)[['ID','Product']].drop_duplicates()
df_new['Existed'] = 'N'

Now join this column df_new['Existed'] with your df and fill missing value with 'Y'

df = df.join(df_new['Existed']).fillna('Y')

If you want lists for product and existed, print this

df.groupby('ID')['Product','Existed'].agg(lambda x: list(x))

Check whether an item appears previously in a list of items using Python Pandas

Answers (1)

Related Questions