UserYmY
UserYmY

Reputation: 8574

Comparing rows of pandas dataframe and find intersection?

I have a df :

year name_list
2009  [sam,maj,mak]
2010 [sam, mak, ali, mo, za]
2011 [mp,ki]

I would like to compare each row in terms of name_list and count how many new names are added/deleted each year. Expected results:

 year   name_list          added_count   removed_count
 2009  [sam,maj,mak]                0         0
 2010  [sam, mak, ali, mo, za]      3         1
 2011  [mp,ki]                      2         5 

Can anybody help?

Upvotes: 0

Views: 172

Answers (1)

tpoh
tpoh

Reputation: 361

First two lines are to initialize 2009 values to zero. Assumes that the years are in chronological order and the years are in the index and not a separate column. Also assumes no duplicate values for the names in column 'name_list'.

df.loc[2009,'added_count'] = 0
df.loc[2009,'removed_count'] = 0
for i in df.index[1:]:
    df.loc[i,'added_count'] = len(list(set(df.loc[i,'name_list'])-set(df.loc[i-1,'name_list'])))
    df.loc[i,'removed_count'] = len(list(set(df.loc[i-1,'name_list'])-set(df.loc[i,'name_list'])))

Upvotes: 1

Related Questions