Reputation: 53
I want to create a new column in my dataframe review giving the language of the column text which is of type object.
I try to convert to string and then use the detect function from langdetect but, there still a type error when I run the code.
I do not understand the problem lol
My code :
from langdetect import detect
review['langue'] = detect((review['text']).astype(str))
Actual result :
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: expected string or bytes-like object
Upvotes: 4
Views: 5316
Reputation: 96
Adding to the answer provided by kvorobieb, you can make a function so that it works even when the detect does not find any alphabets in the given text:
from langdetect import detect
def detect_my(text):
try:
return detect(text)
except:
return 'unknown'
review['langue'] = review['text'].apply(detect_my)
Upvotes: 2
Reputation:
You can use following code for detecting the language for each row
for index, row in df['text'].iteritems():
lang = detect(row) #detecting each row
df.loc[index, 'language'] = lang
Upvotes: 0
Reputation: 5070
If I correctly understood your question you needs
from langdetect import detect
review['langue'] = review['text'].apply(detect)
detect
function expect str
as argument, not pd.Series
. Instead, you should apply detect
function to each element of review['text']
pd.Series
.
Upvotes: 3