Reputation: 953
I'm trying to remove all the phrases that do not belong to the French language. I tried with the langdetect library (and without pandas, unfortunately)
CSV file
message
Je suis fatiguée
The book is on the table
Il fait chaud aujourd'hui!
They are sicks
La vie est belle
Script:
import csv
from langdetect import detect
with open('ddd.csv', 'r') as file:
fichier = csv.reader(file)
for line in fichier:
if line[0] != '':
message = line[0]
def detecteur_FR(message):
#We need to turn the column into a list of lists.
message_list = [comments for comments in message.split('\n')]
for text in message_list:
if detect(text) == 'fr':
message_FR = text
return message_FR
print(detecteur_FR(message))
My output:
None
Je suis fatiguée
None
Il fait chaud aujourd hui!
None
La vie est belle
I want:
Je suis fatiguée
Il fait chaud aujourd hui!
La vie est belle
How could I remove 'None'?
Upvotes: 2
Views: 1842
Reputation: 71
Can you do the comparison before printing the message?
convt_message = detecteur_FR(message)
if convt_message:
print(convt_message)
Upvotes: 2
Reputation: 1514
I think you are getting the Nones because you do not strip the '\n' of the end of each line
try this:
import csv
from langdetect import detect
def detecteur_FR(message):
#We need to turn the column into a list of lists.
message_list = [comments for comments in message.split('\n')]
for text in message_list:
if detect(text) == 'fr':
message_FR = text
print message_FR
with open('ddd.csv', 'r') as file:
fichier = csv.reader(file)
for line in fichier:
if line.strip() != '':
message = line[0]
detecteur_FR(message)
Upvotes: 1
Reputation: 27273
You're redefining the function in every iteration step of the loop.
Instead, define it once (globally) and only call it inside the loop:
import csv
from langdetect import detect
def detecteur_FR(message):
# We need to turn the column into a list of lists.
for text in message.split('\n'):
if detect(text) == 'fr':
return text
with open('ddd.csv', 'r') as file:
for line in csv.reader(file):
if line[0] != '':
result = detecteur_FR(line[0])
if result:
print(result)
Upvotes: 2
Reputation: 37217
You just add a check before printing:
result = detecteur_FR(message)
if result is not None:
print(result)
Upvotes: 5