Reputation: 1
I used jupyter notebook here.
This code is from a youtube video. It was working in the youtuber's computer but mine raise a Stopiteration error
Here I am trying to get all the titles(questions from the csv) that are questions related to 'Go' language
import pandas as pd
df = pd.read_csv("Questions.csv", encoding = "ISO-8859-1", usecols = ["Title", "Id"])
titles = [_ for _ in df.loc[lambda d: d['Title'].str.lower().str.contains(" go "," golang ")]['Title']]
#new cell
import spacy
nlp = spacy.load("en_core_web_sm" , disable= ["ner"])
#new cell
def has_golang(text):
doc = nlp(text)
for t in doc:
if t.lower_ in [' go ', 'golang']:
if t.pos_ != 'VERB':
if t.dep_ == 'pobj':
return True
return False
g = (title for title in titles if has_golang(title))
[next(g) for i in range(10)]
#This is the error
StopIteration Traceback (most recent call last)
<ipython-input-56-862339d10dde> in <module>
9
10 g = (title for title in titles if has_golang(title))
---> 11 [next(g) for i in range(10)]
<ipython-input-56-862339d10dde> in <listcomp>(.0)
9
10 g = (title for title in titles if has_golang(title))
---> 11 [next(g) for i in range(10)]
StopIteration:
As far as I have done the research I think it might be a bug
All I want to do is get those titles that satisfy the 3 'if' conditions
Upvotes: -1
Views: 1277
Reputation: 3504
The StopIteration
is the result of calling next()
on an exhausted iterator, i.e. g
produces less than 10 results. You can get this information from the help()
function.
help(next)
Help on built-in function next in module builtins:
next(...)
next(iterator[, default])
Return the next item from the iterator. If default is given and the iterator
is exhausted, it is returned instead of raising StopIteration.
Your has_golang
is incorrect. The first test is always False
because nlp
tokenizes words, i.e. trims the leading and trailing spaces. Try this:
def has_golang(text):
doc = nlp(text)
for t in doc:
if t.lower_ in ['go', 'golang']:
if t.pos_ != 'VERB':
if t.dep_ == 'pobj':
return True
return False
I figured this out by finding a title which should result in True
from has_golang
. I then ran the following code:
doc = nlp("Making a Simple FileServer with Go and Localhost Refused to Connect")
print("\n".join(str((t.lower_, t.pos_, t.dep_)) for t in doc))
('making', 'VERB', 'csubj') ('a', 'DET', 'det') ('simple', 'PROPN', 'compound') ('fileserver', 'PROPN', 'dobj') ('with', 'ADP', 'prep') ('go', 'PROPN', 'pobj') ('and', 'CCONJ', 'cc') ('localhost', 'PROPN', 'conj') ('refused', 'VERB', 'ROOT') ('to', 'PART', 'aux') ('connect', 'VERB', 'xcomp')
Then looking at ('go', 'PROPN', 'pobj')
, it's obvious that PROPN is not VERB, and pobj is pobj, so the issue has to be with the token: go, specifically "go"
not " go "
.
If you just want the titles that satisfy the 3 if
conditions, skip the generator:
g = list(filter(has_golang, titles))
If you need the generator but also want a list:
g = (title for title in titles if has_golang(title))
list(g)
Upvotes: 2