Reputation: 883

How to return the count of the same elements in two lists?

I have two very large lists(that's why I used ... ), a list of lists:

x = [['I like stackoverflow. Hi ok!'],['this is a great community'],['Ok, I didn\'t like this!.'],...,['how to match and return the frequency?']]

and a list of strings:

y = ['hi', 'nice', 'ok',..., 'frequency']

I would like to return in a new list the times (count) that any word in y occurred in all the lists of x. For example, for the above lists, this should be the correct output:

[(1,2),(2,0),(3,1),...,(n,count)]

As follows, [(1,count),...,(n,count)]. Where n is the number of the list and count the number of times that any word from y appeared in x. Any idea of how to approach this?.

Upvotes: 0

Answers (6)

DoOrDoNot

Reputation: 1156

INPUT:

x = [['I like stackoverflow. Hi ok!'],['this is a great community'],
['Ok, I didn\'t like this!.'],['how to match and return the frequency?']]
y = ['hi', 'nice', 'ok', 'frequency']

CODE:

import re
s1 = set(y)
index = 0
result = []
for itr in x:
    itr = re.sub('[!.?]', '',itr[0].lower()).split(' ')
    # remove special chars and convert to lower case
    s2 = set(itr)
    intersection = s1 & s2
    #find intersection of common strings
    num = len(intersection)
    result.append((index,num))
    index = index+1

OUTPUT:

result = [(0, 2), (1, 0), (2, 1), (3, 1)]

Upvotes: 2

Omid

Reputation: 2667

Maybe you could concatenate the strings in x to make the computation easy:

w = ' '.join(i[0] for i in x)

Now w is a long string like this:

>>> w
"I like stackoverflow. Hi ok! this is a great community Ok, I didn't like this!. how to match and return the frequency?"

With this conversion, you can simply do this:

>>> l = []
>>> for i in range(len(y)):
    l.append((i+1, w.count(str(y[i]))))

which gives you:

>>> l
[(1, 2), (2, 0), (3, 1), (4, 0), (5, 1)]

Upvotes: 1

Avinash Raj

Reputation: 174706

You could do like this also.

>>> x = [['I like stackoverflow. Hi ok!'],['this is a great community'],['Ok, I didn\'t like this!.'],['how to match and return the frequency?']]
>>> y = ['hi', 'nice', 'ok', 'frequency']
>>> l = []
>>> for i,j in enumerate(x):
        c = 0
        for x in y:
            if re.search(r'(?i)\b'+x+r'\b', j[0]):
                c += 1
        l.append((i+1,c))


>>> l
[(1, 2), (2, 0), (3, 1), (4, 1)]

(?i) will do a case-insensitive match. \b called word boundaries which matches between a word character and a non-word character.

Upvotes: 1

Alex Martelli

Reputation: 881695

First, you should preprocess x into a list of sets of lowercased words -- that will speed up the following lookups enormously. E.g:

ppx = []
for subx in x:
    ppx.append(set(w.lower() for w in re.finditer(r'\w+', subx))

(yes, you could collapse this into a list comprehension, but I'm aiming for some legibility).

Next, you loop over y, checking how many of the sets in ppx contain each item of y -- that would be

[sum(1 for s in ppx if w in s) for w in y]

That doesn't give you those redundant first items you crave, but enumerate to the rescue...:

list(enumerate((sum(1 for s in ppx if w in s) for w in y), 1))

should give exactly what you require.

Upvotes: 3

Smart

Reputation: 1

You can make a dictionary where key is each item in the "Y" List. Loop through the values of the keys and look up for them in the dictionary. Keep updating the value as soon as you encounter the word into your X nested list.

Upvotes: 0

Stephen Lin

Reputation: 4912

Here is a more readable solution. Check my comments in the code.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

x = [['I like stackoverflow. Hi ok!'],['this is a great community'],['Ok, I didn\'t like this!.'],['how to match and return the frequency?']]
y = ['hi', 'nice', 'ok', 'frequency']


assert len(x)==len(y), "you have to make sure length of x equals y's"
num = []
for i in xrange(len(y)):
    # lower all the strings in x for comparison
    # find all matched patterns in x and count it, and store result in variable num
    num.append(len(re.findall(y[i], x[i][0].lower())))

res = []
# use enumerate to give output in format you want
for k, v in enumerate(num):
    res.append((k,v))
# here is what you want    
print res

OUTPUT:

[(0, 1), (1, 0), (2, 1), (3, 1)]

Upvotes: 2

How to return the count of the same elements in two lists?

Answers (6)

Related Questions