Reputation: 125
I am trying to exclude certain strings in the list of strings if the string includes certain words.
For example, if there is a word, "cinnamon" or "fruit" or "eat", in the string, I hope to exclude it from the list of strings.
['RT @haussera: Access to Apple Pay customer data, no, but another way? everybody wins - MarketWatch http://t.co/Fm3LE2iTkY', "Landed in the US, tired w horrible migrane. The only thing helping- Connie's new song on repeat. #SoGood #Nashville https://t.co/AscR4VUkMP", 'I wish jacob would be my cinnamon apple', "I've collected 9,112 gold coins! http://t.co/T62o8NoP09 #iphone, #iphonegames, #gameinsight", 'HAHAHA THEY USED THE SAME ARTICLE AS INDEPENDENT http://t.co/mC7nfnhqSw', '@hot1079atl Let me know what you think of the new single "Mirage "\nhttps://t.co/k8DJ7oxkyg', 'RT @SWNProductions: Hey All so we have a new iTunes listing due to our old one getting messed up please resubscribe via the following https…', 'Shawty go them apple bottoms jeans and the boots with the furrrr with furrrr the whole club is looking at her🎶🎶', 'I highly recommend you use MyMedia - a powerfull download manager for the iPhone/iPad. http://t.co/TWmYhgKwBH', 'Alusckが失われた時間の異常を解消しました http://t.co/peYgajYvQY http://t.co/sN3jAJnd1I', 'Театр радует туземцев! Теперь мой остров стал еще круче! http://t.co/EApBrIGghO #iphone, #iphonegames, #gameinsight', 'RT @AppIeOfficiel: Our iPhone 7 📱 http://t.co/d2vCOCOTqt', 'Я выполнил задание "Подключаем резервы"! Заходите ко мне в гости! http://t.co/ZReExwwbxh #iphone #iphonegames #gameinsight', "RT @Louis_Tomlinson: @JennSelby Google 'original apple logo' and you will see the one printed on my shirt that you reported on. Trying to l…", "I've collected 4,100 gold coins! http://t.co/JZLQJdRtLG #iphone, #iphonegames, #gameinsight", "I've collected 28,800 gold coins! http://t.co/r3qXNHwUdp #iphone, #iphonegames, #gameinsight", 'RT @AppIeOfficiel: Our iPhone 7 📱 http://t.co/d2vCOCOTqt']
keywordFilter=['eat','cinnamon','fruit']
for sent in list:
for word in keywordFilter:
if word in sent:
list.remove(sent)
But it does not filter the keyword that I hope and return the original list. Does anyone have idea why?
1st Edit:
import json
from json import *
tweets=[]
for line in open('apple.json'):
try:
tweets.append(json.loads(line))
except:
pass
keywordFilter=set(['pie','juice','cinnamon'])
for tweet in tweets:
for key, value in tweet.items():
if key=='text':
tweetsF.append(value)
print(type(tweetsF))
print(len(tweetsF))
tweetsFBK=[sent for sent in tweetsF if not any(word in sent for word in keywordFilter)]
print(type(tweetsFBK))
print(len(tweetsFBK))
Above is the code I have so far. Up to tweetsF, string is stored well and I have tried to exclude the words by using keywordFilter.
However tweetsFBK returns me 0 (nothing). Does anyone have any idea why?
Upvotes: 2
Views: 10541
Reputation: 20349
Simply complicated :)
final_list = []
for i in original_list:
temp = []
for k in i.split(" "):
if not any(i for i in keywordFilter if i in k):
temp.append(k)
final_list.append(" ".join(temp))
print final_list
Upvotes: 0
Reputation: 1422
One solution is the following:
list = [sent for sent in list
if not any(word in sent for word in keywordFilter)]
It will remove all strings that contain one of the words in the list keywordFilter
as a substring.
For instance, it will remove the second string, since it contains the word repeat
(and eat
is a substring of repeat
).
If you want to avoid this, you can do the following:
list = [sent for sent in list
if not any(word in sent.split(' ') for word in keywordFilter)]
It will remove only strings containing one of the words in the list keywordFilter
as a subword (i.e. delimited by spaces in the sentence).
Upvotes: 8
Reputation: 117906
You can use any
in a list comprehension to filter for you
original_list = ['RT @haussera: Access to Apple Pay customer data, no, but another way? everybody wins - MarketWatch http://t.co/Fm3LE2iTkY', "Landed in the US, tired w horrible migrane. The only thing helping- Connie's new song on repeat. #SoGood #Nashville https://t.co/AscR4VUkMP", 'I wish jacob would be my cinnamon apple', "I've collected 9,112 gold coins! http://t.co/T62o8NoP09 #iphone, #iphonegames, #gameinsight", 'HAHAHA THEY USED THE SAME ARTICLE AS INDEPENDENT http://t.co/mC7nfnhqSw', '@hot1079atl Let me know what you think of the new single "Mirage "\nhttps://t.co/k8DJ7oxkyg', 'RT @SWNProductions: Hey All so we have a new iTunes listing due to our old one getting messed up please resubscribe via the following https…', 'Shawty go them apple bottoms jeans and the boots with the furrrr with furrrr the whole club is looking at her🎶🎶', 'I highly recommend you use MyMedia - a powerfull download manager for the iPhone/iPad. http://t.co/TWmYhgKwBH', 'Alusckが失われた時間の異常を解消しました http://t.co/peYgajYvQY http://t.co/sN3jAJnd1I', 'Театр радует туземцев! Теперь мой остров стал еще круче! http://t.co/EApBrIGghO #iphone, #iphonegames, #gameinsight', 'RT @AppIeOfficiel: Our iPhone 7 📱 http://t.co/d2vCOCOTqt', 'Я выполнил задание "Подключаем резервы"! Заходите ко мне в гости! http://t.co/ZReExwwbxh #iphone #iphonegames #gameinsight', "RT @Louis_Tomlinson: @JennSelby Google 'original apple logo' and you will see the one printed on my shirt that you reported on. Trying to l…", "I've collected 4,100 gold coins! http://t.co/JZLQJdRtLG #iphone, #iphonegames, #gameinsight", "I've collected 28,800 gold coins! http://t.co/r3qXNHwUdp #iphone, #iphonegames, #gameinsight", 'RT @AppIeOfficiel: Our iPhone 7 📱 http://t.co/d2vCOCOTqt']
keywordFilter = set(['eat','cinnamon','fruit'])
filtered_list = [str for str in originial_list if not any(i in str for i in keywordFilter)]
Upvotes: 3