Dumb ML
Dumb ML

Reputation: 367

How can I analyse a text from a pandas column?

I'm used to make some analysis from text files in Python. I usually do something like:

f = open('filename.txt','r')
text = ""
while 1:
    line = f.readline()
    if not line:break
    text += line

f.close()

# tokenize
tokenized_word=word_tokenize(text)
.
.
.

However, now I'm not working with a text file, but with a Pandas dataframe. How can I get the 'text' object from a Pandas column?

I tried taking a look at the post Text mining with Python and pandas, but it's not exactly what I'm looking for.

Upvotes: 0

Views: 418

Answers (2)

Code Pope
Code Pope

Reputation: 5449

Let's imagine this is your datafame:

import pandas as pd 
df = pd.DataFrame({ "Text": ['bla bla bla', 'Hello', 'Other sentence', 'Lets see']})

You can get the synonym to your code by using the agg function:

text = df['Text'].agg(lambda x: ' '.join(x.dropna())) 
text

Result:

'bla bla bla Hello Other sentence Lets see'

Then you can tokenize:

tokenized_word=word_tokenize(text)

Upvotes: 1

gtomer
gtomer

Reputation: 6564

You can iterate through the rows:

for idx, row in df.iterrows():
 tokenized_word=word_tokenize(row['text'])

Upvotes: 0

Related Questions