Reputation: 695
Im trying to filter the list1
based on another list2
with the following code:
import csv
with open('screen.csv') as f: #A file with a list of all the article titles
reader = csv.reader(f)
list1 = list(reader)
print(list1)
list2 = ["Knowledge Management", "modeling language"] #key words that article title should have (at least one of them)
list2 = [str(x) for x in list2]
occur = [i for i in list1 for j in list2 if str(j) in i]
print(occur)
but the output is empty.
Upvotes: 1
Views: 947
Reputation: 7529
list_1
is actually a list of lists, not a list of strings, so you need to flatten it (e.g. by doing this) before trying to compare elements:
list_1 = [['foo bar'], ['baz beep bop']]
list_2 = ['foo', 'bub']
flattened_list_1 = [
element
for sublist in list_1
for element in sublist
]
occurrences = [
phrase
for phrase in flattened_list_1 if any(
word in phrase
for word in list_2
)
]
print(occurrences)
# output:
# ['foo bar']
Upvotes: 1
Reputation: 104802
Your list1
is a list of lists, because the csv.reader
that you're using to create it always returns lists for each row, even if there's only a single item. (If you're expecting a single name from each row, I'm not sure why you're using csv
here, it's only going to be a hindrance.)
Later when you check if str(j) in i
as part of your filtering list comprehension, you're testing if the string j
is present in the list i
. Since the values in list2
are not full titles but key-phrases, you aren't going to find any matches. If you were checking in the inner strings, you'd get substring checks, but when you test list membership it must be an exact match.
Probably the best way to fix the problem is to do away with the nested lists in list1
. Try creating it with:
with open('screen.csv') as f:
list1 = [line.strip() for line in f]
Upvotes: 0
Reputation: 2137
import pandas as pd
import numpy as np
df = pd.DataFrame(data)
print(df[df.column_of_list.map(lambda x: np.isin(x, another_list).all())])
#OR
print(df[df[0].map(lambda x: np.isin(x, another_list).all())])
Try with real data:
import numpy as np
import pandas as pd
data = ["Knowledge Management", "modeling language"]
another_list=["modeling language","natural language"]
df = pd.DataFrame(data)
a = df[df[0].map(lambda x: np.isin(x, another_list).all())]
print(a)
Upvotes: 1