marou95thebest
marou95thebest

Reputation: 199

how to count words in a dataframe using pandas?

Let's say I have that dataframe df:

index      col1      col2
1           48     alpha bravo charlie
2           52     alpha bravo 
3           49     alpha bravo charlie delta echo
4           12     alpha bravo
5           6      alpha

What I want is to delete the first word in col2 when there is more than 2 words in the cell.

So it should look like this:

index      col1      col2
1           48     bravo charlie
2           52     alpha bravo 
3           49     bravo charlie delta echo
4           12     alpha bravo
5           6      alpha

I have coded the line to df['col2'] = df['col2'].apply(lambda x: ' '.join(x.split(' ')[1:]))

but I don't know how to apply the condition into my dataframe.

Upvotes: 3

Views: 177

Answers (2)

falsetru
falsetru

Reputation: 369054

Using regular expression re.Pattern.sub:

>>> import re
>>> pattern = re.compile(r'^\S+ (?=\S+ )')
>>> pattern.sub('', 'bravo charlie delta echo')
'charlie delta echo'
>>> pattern.sub('', 'alpha')
'alpha'
>>> import re
>>> from functools import partial
>>> df['col2'] = df['col2'].apply(partial(pattern.sub, ''))
>>> df
   col1                      col2
0    48             bravo charlie
1    52               alpha bravo
2    49  bravo charlie delta echo
3    12               alpha bravo
4     6                     alpha

Upvotes: 1

jezrael
jezrael

Reputation: 862591

Add if-else statement with count spaces:

df['col2'] = df['col2'].apply(lambda x: ' '.join(x.split()[1:]) if x.count(' ') > 1 else x)

Or:

df['col2'] = df['col2'].apply(lambda x: x.split(maxsplit=1)[1] if x.count(' ') > 1 else x)

print (df)
   index  col1                      col2
0      1    48             bravo charlie
1      2    52               alpha bravo
2      3    49  bravo charlie delta echo
3      4    12               alpha bravo
4      5     6                     alpha

Pandas alternative:

df['col2']=df['col2'].mask(df['col2'].str.count(' ') > 1, df['col2'].str.split(n=1).str[1])
print (df)
   index  col1                      col2
0      1    48             bravo charlie
1      2    52               alpha bravo
2      3    49  bravo charlie delta echo
3      4    12               alpha bravo
4      5     6                     alpha

Upvotes: 6

Related Questions