Nadège
Nadège

Reputation: 53

How to detect language of a dataframe object?

I want to create a new column in my dataframe review giving the language of the column text which is of type object.

I try to convert to string and then use the detect function from langdetect but, there still a type error when I run the code.

I do not understand the problem lol

My code :

from langdetect import detect


review['langue'] = detect((review['text']).astype(str))

Actual result :

--------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)


TypeError: expected string or bytes-like object

Upvotes: 4

Views: 5316

Answers (3)

Amita Kapoor
Amita Kapoor

Reputation: 96

Adding to the answer provided by kvorobieb, you can make a function so that it works even when the detect does not find any alphabets in the given text:

from langdetect import detect
def detect_my(text):
   try:
       return detect(text)
   except:
       return 'unknown'

review['langue'] = review['text'].apply(detect_my)

Upvotes: 2

user2110417
user2110417

Reputation:

You can use following code for detecting the language for each row

for index, row in df['text'].iteritems():
    lang = detect(row) #detecting each row
    df.loc[index, 'language'] = lang

Upvotes: 0

kvorobiev
kvorobiev

Reputation: 5070

If I correctly understood your question you needs

from langdetect import detect
review['langue'] = review['text'].apply(detect)

detect function expect str as argument, not pd.Series. Instead, you should apply detect function to each element of review['text'] pd.Series.

Upvotes: 3

Related Questions