Reputation: 8574
I have a df :
year name_list
2009 [sam,maj,mak]
2010 [sam, mak, ali, mo, za]
2011 [mp,ki]
I would like to compare each row in terms of name_list and count how many new names are added/deleted each year. Expected results:
year name_list added_count removed_count
2009 [sam,maj,mak] 0 0
2010 [sam, mak, ali, mo, za] 3 1
2011 [mp,ki] 2 5
Can anybody help?
Upvotes: 0
Views: 172
Reputation: 361
First two lines are to initialize 2009 values to zero. Assumes that the years are in chronological order and the years are in the index and not a separate column. Also assumes no duplicate values for the names in column 'name_list'.
df.loc[2009,'added_count'] = 0
df.loc[2009,'removed_count'] = 0
for i in df.index[1:]:
df.loc[i,'added_count'] = len(list(set(df.loc[i,'name_list'])-set(df.loc[i-1,'name_list'])))
df.loc[i,'removed_count'] = len(list(set(df.loc[i-1,'name_list'])-set(df.loc[i,'name_list'])))
Upvotes: 1