Reputation: 317
In the following code, I am trying to get elements that can be trained on SpaCy NER Model (in the 9th line of code).
from ast import literal_eval
import re
train_data_list = []
for i in range(len(train_data)):
a = re.search(train_data.subtext[i], train_data.text[i])
if a is not None:
element = '("' +train_data.text[i] + '"' + ', {"entities": [(' +
str(a.start()) + ',' + str(a.end()) + ',"SKILL")]})'
train_data_list.append(literal_eval(element))
But I am encountering the following error
SyntaxError: EOL while scanning string literal
Thanks in Advance.
Upvotes: 1
Views: 25828
Reputation: 61865
One (or more) of the element
strings supplied to literal_eval
cannot be parsed by literal_eval
.. That is, the program syntax is valid (or else the program would fail without running anything!), and it is one or more of the element
values supplied to literal_eval
is not valid Python!
The first step is to identify some 'invalid' values, eg.
from ast import literal_eval
import re
train_data_list = []
for i in range(len(train_data)):
a = re.search(train_data.subtext[i], train_data.text[i])
if a is not None:
element = '("' +train_data.text[i] + '"' + ', {"entities": [(' + str(a.start()) + ',' + str(a.end()) + ',"SKILL")]})'
try:
data = literal_eval(element)
train_data_list.append(data)
except:
print("Failed to parse element as a Python literal!")
print(">>")
print(repr(element))
print("<<")
If the above "runs" (fsvo. "runs") then the proposed hypothesis holds the non-relevant answers can be ignored ;-)
Anyway, the solution is to not use literal_eval
at all. Instead, create an object directly:
for i in range(len(train_data)):
a = re.search(train_data.subtext[i], train_data.text[i])
if a is not None:
# might be a bit off.. YMMV.
data = (train_data.text[i],
{"entities": [(str(a.start()), str(a.end()), "SKILL")]})
train_data_list.append(data)
Now, if values of train_data.text[i]
contain a \n
- that is, the literal two-character '\' and 'n' escape sequence - there may be additional work required to turn those into newline characters .. but one step at a time. And no step should be backward! :D
Upvotes: 0
Reputation: 3447
You cannot split a long line into multiple lines hitting enter. Either change your element=
line to a single line like this
element = '("' +train_data.text[i] + '"' + ', {"entities": [(' + str(a.start()) + ',' + str(a.end()) + ',"SKILL")]})'
or add a \
at the end of the line
element = '("' +train_data.text[i] + '"' + ', {"entities": [(' + \
str(a.start()) + ',' + str(a.end()) + ',"SKILL")]})'
Upvotes: 2