matthew.collins
matthew.collins

Reputation: 41

compare list of dictionaries to dataframe, show missing values

I have a list of dictionaries

example_list = [{'email':'[email protected]'},{'email':'[email protected]'}]

and a dataframe with an 'Email' column

I need to compare the list against the dataframe and return the values that are not in the dataframe.

I can certainly iterate over the list, check in the dataframe, but I was looking for a more pythonic way, perhaps using list comprehension or perhaps a map function in dataframes?

Upvotes: 3

Views: 157

Answers (3)

matthew.collins
matthew.collins

Reputation: 41

I ended up converting the list into a dataframe, comparing the two dataframes by merging them on a column, and then creating a dataframe out of the missing values

so, for example

    example_list = [{'email':'[email protected]'},{'email':'[email protected]'}]
    df_two = pd.DataFrame(item for item in example_list)
    common = df_one.merge(df_two, on=['Email'])
    df_diff = df_one[(~df_one.Email.isin(common.Email))]

Upvotes: 0

jpp
jpp

Reputation: 164623

One way is to take one set from another. For a functional solution you can use operator.itemgetter:

from operator import itemgetter

res = set(map(itemgetter('email'), example_list)) - set(df['email'])

Note - is syntactic sugar for set.difference.

Upvotes: 1

cs95
cs95

Reputation: 402353

To return those values that are not in DataFrame.email, here's a couple of options involving set difference operations—

np.setdiff1d

emails = [d['email'] for d in example_list)]
diff = np.setdiff1d(emails, df['Email'])   # returns a list

set.difference

# returns a set
diff = set(d['email'] for d in example_list)).difference(df['Email'])

Upvotes: 1

Related Questions