pandas generates a new column based on values from another column considering duplicates

Question

I am working on a dataframe which has a column that each value is a list, now I want to derive a new column which only considers list whose size is greater than 1, assigns a unique integer to the corresponding row as id. If elements in two lists are the same but with a different order, the two lists should be assigned the same id. A sample dataframe is like,

document_no_list    cluster_id
[1,2,3]             1
[3,2,1]             1
[4,5,6,7]           2
[8]                 0
[9,10]              3
[10,9]              3

column cluster_id only considers the 1st, 2nd, 3rd, 5th and 6th row, each of which has a size greater than 1, and assigns a unique integer id to its corresponding cell in the column, also [1,2,3], [3,2,1] and [9,10], [10,9] should be assigned the same cluster_id.

I was asking a similar question without considering duplicates list values, at

pandas how to derived values for a new column base on another column

I am wondering how to do that in pandas.

pandas generates a new column based on values from another column considering duplicates

Answers (1)

Related Questions