sazr
sazr

Reputation: 25928

Perform set operation difference on a list of tuples

I am trying to get the difference between 2 containers but the containers are in a weird structure so I dont know whats the best way to perform a difference on it. One containers type and structure I cannot alter but the others I can(variable delims).

delims = ['on','with','to','and','in','the','from','or']
words = collections.Counter(s.split()).most_common()
# words results in [("the",2), ("a",9), ("diplomacy", 1)]

#I want to perform a 'difference' operation on words to remove all the delims words
descriptive_words = set(words) - set(delims)

# because of the unqiue structure of words(list of tuples) its hard to perform a difference
# on it. What would be the best way to perform a difference? Maybe...

delims = [('on',0),('with',0),('to',0),('and',0),('in',0),('the',0),('from',0),('or',0)]
words = collections.Counter(s.split()).most_common()
descriptive_words = set(words) - set(delims)

# Or maybe
words = collections.Counter(s.split()).most_common()
n_words = []
for w in words:
   n_words.append(w[0])
delims = ['on','with','to','and','in','the','from','or']
descriptive_words = set(n_words) - set(delims)

Upvotes: 1

Views: 1173

Answers (5)

Gareth Latty
Gareth Latty

Reputation: 89017

The simplest answer is to do:

import collections

s = "the a a a a the a a a a a diplomacy"
delims = {'on','with','to','and','in','the','from','or'}
// For older versions of python without set literals:
// delims = set(['on','with','to','and','in','the','from','or'])
words = collections.Counter(s.split())

not_delims = {key: value for (key, value) in words.items() if key not in delims}
// For older versions of python without dict comprehensions:
// not_delims = dict(((key, value) for (key, value) in words.items() if key not in delims))

Which gives us:

{'a': 9, 'diplomacy': 1}

An alternative option is to do it pre-emptively:

import collections

s = "the a a a a the a a a a a diplomacy"
delims = {'on','with','to','and','in','the','from','or'}
counted_words = collections.Counter((word for word in s.split() if word not in delims))

Here you apply the filtering on the list of words before you give it to the counter, and this gives the same result.

Upvotes: 1

brice
brice

Reputation: 25039

This I how I would do it:

delims = set(['on','with','to','and','in','the','from','or'])
# ...
descriptive_words = filter(lamdba x: x[0] not in delims, words)

Using the filter method. A viable alternative would be:

delims = set(['on','with','to','and','in','the','from','or'])
# ...
decsriptive_words = [ (word, count) for word,count in words if word not in delims ]

Making sure that the delims are in a set to allow for O(1) lookup.

Upvotes: 1

Mp0int
Mp0int

Reputation: 18727

For performance, you can use lambda functions

filter(lambda word: word[0] not in delim, words)

Upvotes: 0

John La Rooy
John La Rooy

Reputation: 304215

How about just modifying words by removing all the delimiters?

words = collections.Counter(s.split())
for delim in delims:
    del words[delim]

Upvotes: 3

Rob Young
Rob Young

Reputation: 1245

If you're iterating through it anyway why bother converting them to sets?

dwords = [delim[0] for delim in delims]
words  = [word for word in words if word[0] not in dwords]

Upvotes: 0

Related Questions