Reputation:
I have this dataframe
Word Frequency
0 : 79
1 , 60
2 look 26
3 e 26
4 a 25
... ... ...
95 trump 2
96 election 2
97 step 2
98 day 2
99 university 2
I would like to remove all words having less than 3 characters. I tried as follows:
df['Word']=df['Word'].str.findall('\w{3,}').str.join(' ')
but it does not remove them from my datataset. Can you please tell me how to remove them? My expected output would be:
Word Frequency
2 look 26
... ... ...
95 trump 2
96 election 2
97 step 2
98 day 2
99 university 2
Upvotes: 2
Views: 2694
Reputation: 13407
Instead of attempting a regular expression, you can use .str.len()
to get the length of each string of your column. Then you can simply filter based on that length for >= 3
Should look like:
df.loc[df["Word"].str.len() >= 3]
Upvotes: 2