Reputation: 245
Hey I have a csv with multilingual text. All I want is a column appended with a the language detected. So I coded as below,
from langdetect import detect
import csv
with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvinput:
with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('Lang')
all.append(row)
for row in reader:
row.append(detect(row[0]))
all.append(row)
writer.writerows(all)
But I am getting the error as LangDetectException: No features in text
The traceback is as follows
runfile('C:/Users/dell/.spyder2-py3/temp.py', wdir='C:/Users/dell/.spyder2-py3')
Traceback (most recent call last):
File "<ipython-input-25-5f98f4f8be50>", line 1, in <module>
runfile('C:/Users/dell/.spyder2-py3/temp.py', wdir='C:/Users/dell/.spyder2-py3')
File "C:\Users\dell\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Users\dell\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/dell/.spyder2-py3/temp.py", line 21, in <module>
row.append(detect(row[0]))
File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector_factory.py", line 130, in detect
return detector.detect()
File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 136, in detect
probabilities = self.get_probabilities()
File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 143, in get_probabilities
self._detect_block()
File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 150, in _detect_block
raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
LangDetectException: No features in text.
This is how my csv looks like 1)skunkiest smokiest yummiest strain pain killer and mood lifter 2)Relaxation, euphorique, surélevée, somnolence, concentré, picotement, une augmentation de l’appétit, soulager la douleur Giggly, physique, esprit sédation 3)Reduzierte Angst, Ruhe, gehobener Stimmung, zerebrale Energie, Körper Sedierung 4)Calmante, relajante muscular, Relajación Mental, disminución de náuseas 5)重いフルーティーな幸せ非常に強力な頭石のバースト
Please help me with this.
Upvotes: 18
Views: 33559
Reputation: 4313
It is a bad practice to catch all possible exceptions. Let me propose something more complete, more readable and safer:
rx_letters = re.compile("[a-z]+", re.I)
for row in reader:
try:
if rx_letters.search(row[0]) is not None:
row.append(detect(row[0]))
except LangDetectException as e:
row.append("?")
print(f"Lang detect failed for: '{row[0]}'")
rx_letters
check can be skipped, but I find it more elegant to check for the most basic condition.
Upvotes: 0
Reputation: 1
The error occurrs when string has no letters. If you want to ignore that row and continue the process.
for i in df.index:
str = df.iloc[i][1]
try:
lang = detect(str)
except:
continue
Upvotes: 0
Reputation: 1964
The error occurred when passing an object with no letters to detect
. At least one letter should be there.
To reproduce, run any of below commands:
detect('.')
detect(' ')
detect('5')
detect('/')
So, you may apply some text pre-processing first to drop records in which row[0]
value is an empty string,a null value, a white space, a number, a special character, or simply doesn't include any alphabets.
Upvotes: 9
Reputation: 1612
the problem is a null text or something like ' ' with no value; check this in a condition and loop your reader in a list comprehension or
from langdetect import detect
textlang = [detect(elem) for elem in textlist if len(elem) > 50]
textlang = [detect(elem) if len(elem) > 50 else elem == 'no' for elem in textlist]
or with a loop
texl70 = df5['Titletext']
langdet = []
for i in range(len(df5)):
try:
lang=detect(texl70[i])
except:
lang='no'
print("This row throws error:", texl70[i])
langdet.append(lang)
Upvotes: 5
Reputation: 2854
You can use something like this to detect which line in your file is throwing the error:
for row in reader:
try:
language = detect(row[0])
except:
language = "error"
print("This row throws and error:", row[0])
row.append(language)
all.append(row)
What you're going to see is that it probably fails at "重いフルーティーな幸せ非常に強力な頭石のバースト". My guess is that detect()
isn't able to 'identify' any characters to analyze in that row, which is what the error implies.
Other things, like when the input is only a URL, also cause this error.
Upvotes: 18