Jonas Palačionis
Jonas Palačionis

Reputation: 4842

Determining what language a string contains in a pandas DataFrame

I am new to Pandas and Python.

My dataframe:

df

Text
Best tv in 2020
utilizar un servicio sms gratuito
utiliser un tv pour netflix

My desired output

Text                                    Language
Best tv in 2020                         en
utilizar un servicio sms gratuito       es
utiliser un tv pour netflix             fr

What I am using:

from textblob import TextBlob

b = TextBlob("utilizar un servicio sms gratuito")
print(b.detect_language())

>>es

I am not sure how I could integrate this method to fill my Pandas Dataframe.

I have tried:

df['Language'] = TextBlob(df['Text']).detect_language()

But I am getting an error:

TypeError: The `text` argument passed to `__init__(text)` must be a string, not <class 'pandas.core.series.Series'>

I understand what it means, that I need to pass a string rather than pandas DataFrame Series, so my question is how would I loop the entire Series to detect language per row in column text?

Thank you for your suggestions.

Upvotes: 1

Views: 1402

Answers (1)

jezrael
jezrael

Reputation: 862761

Use Series.apply with lambda function:

df['Language'] = df['Text'].apply(lambda x: TextBlob(x).detect_language())

Or Series.map:

df['Language'] = df['Text'].map(lambda x: TextBlob(x).detect_language())

print (df)
                                Text Language
0                    Best tv in 2020       en
1  utilizar un servicio sms gratuito       es
2        utiliser un tv pour netflix       fr

Upvotes: 3

Related Questions