user1452494
user1452494

Reputation: 1185

Pythonic way to find 2 items from two lists existing in a another list

I have some twitter data and I split the text into those with happy emoticons and sad emoticons elegantly and pythonically like so:

happy_set = [":)",":-)","=)",":D",":-D","=D"]
sad_set = [":(",":-(","=("]

happy = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad = [tweet.split() for tweet in data for face in sad_set if face in tweet]

This works, however, it could be the case that both an emoticon from the happy_set and sad_set could be found in a single tweet. What is the pythonic way to ensure that the happy list only contains emoticons from the happy_set and vice versa?

Upvotes: 1

Views: 147

Answers (3)

Alex Riley
Alex Riley

Reputation: 176830

You could try using sets, specifically set.isdisjoint. Check to see if the set of tokens in a happy tweet is disjoint from sad_set. If so, it definitely belongs in happy:

happy_set = set([":)",":-)","=)",":D",":-D","=D"])
sad_set = set([":(",":-(","=("])

# happy is your existing set of potentially happy tweets. To remove any tweets with sad tokens...
happy = [tweet for tweet in happy if sad_set.isdisjoint(set(tweet.split()))]

Upvotes: 3

gwenzek
gwenzek

Reputation: 2944

I would use lambdas :

>>> is_happy = lambda tweet: any(map(lambda x: x in happy_set, tweet.split()))
>>> is_sad = lambda tweet: any(map(lambda x: x in sad_set, tweet.split()))

>>> data = ["Hi, I am sad :( but don't worry =D", "Happy day :-)", "Boooh :-("]
>>> filter(lambda tweet: is_happy(tweet) and not is_sad(tweet), data)
['Happy day :-)']
>>> filter(lambda tweet: is_sad(tweet) and not is_happy(tweet), data)
['Boooh :-(']

That will avoid creating intermediary copies of data.

And if data is really big you can replace filter by an ifilter from the package itertoolsto get an iterator instead of a list.

Upvotes: 1

Sylvain Leroux
Sylvain Leroux

Reputation: 52000

Is that you are looking for?

happy_set = set([":)",":-)","=)",":D",":-D","=D"])
sad_set = set([":(",":-(","=("])

happy_maybe_sad = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad_maybe_happy = [tweet.split() for tweet in data for face in sad_set if face in tweet]

happy = [item for item in happy_maybe_sad if not in sad_maybe_happy]
sad = [item for item in sad_maybe_happy if not in happy_maybe_sad]

For happy... and sad..., I stick with the list solution as the item's order is maybe relevant. If not, it might be better using set() for performances though. Is additions, sets already provides the basic sets operations (unions, intersection, etc.)

Upvotes: 0

Related Questions