Reputation: 95
So I accidentally forgot to include a return statement in my method and it just finished running after 10 hours, so i do not want to run it again. Is there a way i can access the wordlist
inside this function?
def rm_foreign_chars(corpus):
wordlist=[]
for text in corpus:
for sentence in sent_tokenize(text):
wordlist.append(sentence)
for word in word_tokenize(sentence):
for c in symbols:
if c in word:
if sentence in wordlist:
wordlist.remove(sentence)
break
Symbols is a list of symbols: symbols = '฿‑‒–—―‖†‡•‰⁰⁷⁸₂₣℃™→↔∆∙≤⋅─■□▪►▼●◦◾★☎☺♣♦✓✔❖❗➡⠀ⱻ�ₒ'1
Upvotes: 2
Views: 299
Reputation: 43078
Unfortunately, there is no way to access the wordList
outside of the function without using some really hacky methods, and munging around in memory. Instead, we can focus on making your function faster. This is what I came up with:
def rm_foreign_chars(corpus):
wordlist=[]
for text in corpus:
for sentence in sent_tokenize(text):
if not any(c in word for word in word_tokenize(sentence) for c in symbols):
wordlist.append(sentence)
return wordlist
You can also make wordlist
a global variable. Only reason I suggest making it global is due to how long the function runs (27 minutes is still a long time) If the function fails before completion, you can still get something from wordlist
.
def rm_foreign_chars(corpus):
global wordlist
for text in corpus:
for sentence in sent_tokenize(text):
if not any(c in word for word in word_tokenize(sentence) for c in symbols):
wordlist.append(sentence)
return wordlist
wordlist=[]
rm_foreign_chars(...)
# use wordlist here
Upvotes: 4
Reputation: 4670
There is no way to do this without returning the list. The alternative would be to create a class which contains the function and store the list as an attribute of self
.
class Characters:
def __init__(self, corpus):
self.corpus = corpus
self.wordlist = []
def foreign_chars(self):
pass
# Function code goes here
# Be sure to replace corpus and wordlist
# With their respective self attributes
chars = Characters()
chars.foreign_chars()
words = chars.wordlist
Do refer to the other answers and comments to optimize your code.
Upvotes: 2