Jose Ramon
Jose Ramon

Reputation: 5344

Find co-occurrences of two lists in python

I have got two lists. The first one contains names and second one names and corresponding values. The names of the first list in a subset of the name of the name of the second lists. The values are a true or false. I want to find the co-occurrences of the names of both lists and count the true values. My code:

data1 = [line.strip() for line in open("text_files/first_list.txt", 'r')]
ins = open( "text_files/second_list.txt", "r" )  # the "r" is not really needed - default 
parseTable = []

for line in ins:
   row = line.rstrip().split(' ')  # <- note use of rstrip()
   parseTable.append(row)

new_data = []
indexes = []
for  index in range(len(parseTable)):
   new_data.append(parseTable[index][0])
   indexes.append(parseTable[index][1])
in1 =return_indices_of_a(new_data, data1)

def return_indices_of_a(a, b):
  b_set = set(b)
  return [i for i, v in enumerate(a) if v in b_set] #return the co-occurrences

I am reading both text files which containing the lists, i found the co-occurrences and then I want to keep from the parseTable[][1] only the in1 indices . Am I doing it right? How can I keep the indices I want? My two lists:

['SITNC', 'porkpackerpete', 'teensHijab', '1DAlert', 'IsmodoFashion',....
[['SITNC', 'true'], ['1DFAMlLY', 'false'], ['tibi', 'true'], ['1Dneews', 'false'], ....

Upvotes: 1

Views: 703

Answers (3)

Daniel
Daniel

Reputation: 8441

Here's a one liner to get the matches:

matches = [(name, dict(values)[name]) for name in set(names) if name in dict(values)]

and then to get the number of true matches:

len([name for (name, value) in matches if value == 'true'])

Edit

You might want to move dict(values) into a named variable:

value_map = dict(values)
matches = [(name, value_map[name]) for name in set(names) if name in value_map]

Upvotes: 2

Andrey Sobolev
Andrey Sobolev

Reputation: 12713

If you need just the sum of true values, then use in operator and list comprehension:

In [1]: names = ['SITNC', 'porkpackerpete', 'teensHijab', '1DAlert', 'IsmodoFashion']

In [2]: values = [['SITNC', 'true'], ['1DFAMlLY', 'false'], ['tibi', 'true'], ['1Dneews', 'false']]

In [3]: sum_of_true = len([v for v in values if v[0] in names and v[1] == "true"])

In [4]: sum_of_true
Out[4]: 1

To get also indices of co-occurrences, this one-liner may come in handy:

In [6]: true_indices = [names.index(v[0]) for v in values if v[0] in names and v[1] == "true"]

In [7]: true_indices
Out[7]: [0]

Upvotes: 1

bereal
bereal

Reputation: 34292

There are two ways, one is what Andrey suggests (you may want to convert names to set), or, alternatively, convert the second list into a dictionary:

mapping = dict(values)
sum_of_true = sum(mapping[n] for n in names)

The latter sum works because bool is essentially int in Python (True == 1).

Upvotes: 1

Related Questions