Search for value in dataframe that contains a list

Question

I have a dataframe that looks like this :

id  points
a   [c,v,b,n]
b   []
c   [x,a]
....

and a dictionary (i also have it as dataframe):

{'a': ['j','c'],
 'b': [p,r,q]
 'c': [n,k,l,x,a]
 ....}

I want to search if the key of the dictionary is contained is the points of the dataframe and then remove the items from dictionary points that there is no match in the dictionary.Expected output:

id  points
a   [c]
b   []
c   [x,a]

I tried this

for key,point in my_dict.items():
    if df['points'].str.contains(point).any()

but i get TypeError: unhashable type: 'list'

I tried converting the dataframe to a dictionary but then the search time is too much because i need more for loops. Any suggestions for code or data structure improvements?

Edit

Another representation of the data :

id  points
a   [c,v,b,n]
b   []
c   [x,a]
....

and

points
j,c
p,r,q
n,k,l,x,a

EdChum · Accepted Answer

You can call apply and convert your dict values into a set can convert the intersection to a list:

In [15]:
d={'a': ['j','c'],
 'b': ['p','r','q'],
 'c': ['n','k','l','x','a']}
d

Out[15]:
{'a': ['j', 'c'], 'b': ['p', 'r', 'q'], 'c': ['n', 'k', 'l', 'x', 'a']}

In [17]:
df['points'] = df.apply(lambda row: list(set(d[row['id']]).intersection(row['points'])), axis=1)
df

Out[17]:
  id  points
0  a     [c]
1  b      []
2  c  [a, x]

As to why you get an error, you're trying to call a .str method on a Series that is a dtype list, they are not strings.

Search for value in dataframe that contains a list

Answers (1)

Related Questions