Greencolor
Greencolor

Reputation: 695

Filter list based on another list in python

Im trying to filter the list1 based on another list2 with the following code:

import csv

with open('screen.csv') as f: #A file with a list of all the article titles
    reader = csv.reader(f)
    list1 = list(reader)

print(list1)

list2 = ["Knowledge Management", "modeling language"] #key words that article title should have (at least one of them)
list2 = [str(x) for x in list2]

occur = [i for i in list1  for j in list2 if str(j) in i]

print(occur)

but the output is empty.

My list1 looks like this: enter image description here

Upvotes: 1

Views: 947

Answers (3)

jfaccioni
jfaccioni

Reputation: 7529

list_1 is actually a list of lists, not a list of strings, so you need to flatten it (e.g. by doing this) before trying to compare elements:

list_1 = [['foo bar'], ['baz beep bop']]
list_2 = ['foo', 'bub']

flattened_list_1 = [
    element 
    for sublist in list_1 
    for element in sublist
]
occurrences = [
    phrase 
    for phrase in flattened_list_1 if any(
        word in phrase 
        for word in list_2
    )
]
print(occurrences)

# output:
# ['foo bar']

Upvotes: 1

Blckknght
Blckknght

Reputation: 104802

Your list1 is a list of lists, because the csv.reader that you're using to create it always returns lists for each row, even if there's only a single item. (If you're expecting a single name from each row, I'm not sure why you're using csv here, it's only going to be a hindrance.)

Later when you check if str(j) in i as part of your filtering list comprehension, you're testing if the string j is present in the list i. Since the values in list2 are not full titles but key-phrases, you aren't going to find any matches. If you were checking in the inner strings, you'd get substring checks, but when you test list membership it must be an exact match.

Probably the best way to fix the problem is to do away with the nested lists in list1. Try creating it with:

with open('screen.csv') as f:
    list1 = [line.strip() for line in f]

Upvotes: 0

Mahsa Hassankashi
Mahsa Hassankashi

Reputation: 2137

import pandas as pd 
import numpy as np
df = pd.DataFrame(data) 
print(df[df.column_of_list.map(lambda x: np.isin(x, another_list).all())])
#OR
print(df[df[0].map(lambda x: np.isin(x, another_list).all())])

Try with real data:

import numpy as np
import pandas as pd 
data = ["Knowledge Management", "modeling language"]
another_list=["modeling language","natural language"]
df = pd.DataFrame(data) 
a = df[df[0].map(lambda x: np.isin(x, another_list).all())]

print(a)

Upvotes: 1

Related Questions