OmZPrime
OmZPrime

Reputation: 25

Getting the length of text in a dataframe in python

So i have this dataframe:

    Text                                             target
    #Coronavirus is a cover for something else. #5...   D
    Crush the One Belt One Road !! \r\n#onebeltonf...   B
    RT @nickmyer: It seems to be, #COVID-19 aka #c...   B
    @Jerusalem_Post All he knows is how to destroy...   B
    @newscomauHQ Its gonna show us all. We will al...   B

Where Text are tweets and i am trying to get the count of each string in the text column and input the count into the dataframe. And i have tried this

d = pd.read_csv('5gCoronaFinal.csv')
d['textlength'] = [len(int(t)) for t in d['Text']]

But it keeps giving me this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-dabcab1de7b2> in <module>
----> 1 d['textlength'] = [len(t) for t in d['Text']]

<ipython-input-42-dabcab1de7b2> in <listcomp>(.0)
----> 1 d['textlength'] = [len(t) for t in d['Text']]

TypeError: object of type 'float' has no len()

I've tried converting t to integer like so:

d['textlength'] = [len(int(t)) for t in d['Text']]

but then it gives me this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-9ae56e5f7912> in <module>
----> 1 d['textlength'] = [len(int(t)) for t in d['Text']]

<ipython-input-43-9ae56e5f7912> in <listcomp>(.0)
----> 1 d['textlength'] = [len(int(t)) for t in d['Text']]

ValueError: invalid literal for int() with base 10: '#Coronavirus is a cover for something else. #5g is being rolled out and they are expecting lots to...what? Die from #60ghz +. They look like they are to keep the cold in? #socialdistancing #covid19 #

I need some help thanks!

Upvotes: 1

Views: 2018

Answers (1)

yatu
yatu

Reputation: 88236

You can use the str accessor for vectorised string operations. In this case you can use str.split and str.len:

df['Text_length'] = df.Text.str.split().str.len()

print(df)

                                                Text target  Text_length
0  #Coronavirus is a cover for something else. #5...      D            8
1  Crush the One Belt One Road !! \r\n#onebeltonf...      B            8
2      RT @nickmyer: It seems to be, #COVID-19 aka #      B            9
3     @Jerusalem_Post All he knows is how to destroy      B            8
4     @newscomauHQ Its gonna show us all. We will al      B            9

Upvotes: 3

Related Questions