Reputation: 117
I have the following table which has IDs and for each ID i have Basket 1, Basket 2 and finally the products in the baskets.
I want to find on the sorted table (sort on ID from A-Z, Basket 1 ASC and Basket 2 ASC) whether each product for every ID appears previously (for the same ID) and put Y or N in a new column (column E).
Example is below for the first ID:
111AAA222 = [Product1, Product2, Product3, Product1, Product2, Product3, Product1, Product4]
111AAA222 = [N, N, N, Y, Y, Y, Y, N]
The example shows that the Products from the 3rd till the 6th position in the list appeared previously in the same list.
Im having difficulties to create the new column (Column E: Existed) with the values Y and N using python pandas.
Upvotes: 1
Views: 259
Reputation: 29635
Create a df_new
where data is sorted and you drop duplicates over the two columns 'ID' and 'Product'. Then create the column Existed filled with N as the row in df_new are not the duplicates
df_new = df.sort_values(['ID','Basket 1','Basket 2'],0)[['ID','Product']].drop_duplicates()
df_new['Existed'] = 'N'
Now join
this column df_new['Existed']
with your df and fill missing value with 'Y'
df = df.join(df_new['Existed']).fillna('Y')
If you want lists for product and existed, print this
df.groupby('ID')['Product','Existed'].agg(lambda x: list(x))
Upvotes: 2