Reputation: 1008

how to eliminate 3 letter words or 4 letter words from a column of a dataframe

I have a dataframe as below:

import pandas as pd
import dask.dataframe as dd
a = {'b':['category','categorical','cater pillar','coming and going','bat','No Data','calling','cal'],
     'c':['strd1','strd2','strd3', 'strd4','strd5','strd6','strd7', 'strd8']
    }
df11 = pd.DataFrame(a,index=['x1','x2','x3','x4','x5','x6','x7','x8'])

I wanted to remove words whose length of each value is three. I expect results to be like:

   b                         c
category                   strd1    
categorical                strd2     
cater pillar               strd3
coming and going           strd4      
NaN                        strd5      
No Data                    strd6        
calling                    strd7         
NaN                        strd8

Upvotes: 1

Answers (5)

LaurinO

Reputation: 136

Something like:

for i, ele in enumerate(df11['b']):
    if len(ele) == 3:
        df11['b'][i] = np.nan

Upvotes: 0

Sandalaphon

Reputation: 154

You could use a where conditional:

    df11['b'] = df11['b'].where(df11.b.map(len) != 3, np.nan)

Upvotes: 0

BENY

Reputation: 323226

Maybe check mask

df11.b.mask(df11.b.str.len()<=3,inplace=True)
df11
Out[16]: 
                   b      c
x1          category  strd1
x2       categorical  strd2
x3      cater pillar  strd3
x4  coming and going  strd4
x5               NaN  strd5
x6           No Data  strd6
x7           calling  strd7
x8               NaN  strd8

Upvotes: 2

anky

Reputation: 75080

Use series.str.len() to identify the length of the string in a series and then compare with series.eq(), then using df.loc[] you can assign the values of b as np.nan where the condition matches:

df11.loc[df11.b.str.len().eq(3),'b']=np.nan

                   b      c
x1          category  strd1
x2       categorical  strd2
x3      cater pillar  strd3
x4  coming and going  strd4
x5               NaN  strd5
x6           No Data  strd6
x7           calling  strd7
x8               NaN  strd8

Upvotes: 4

Erfan

Reputation: 42896

Use str.len to get the length of each string and then conditionally replace them toNaN with np.where if the length is equal to 3:

df11['b'] = np.where(df11['b'].str.len().eq(3), np.NaN, df11['b'])

                  b      c
0          category  strd1
1       categorical  strd2
2      cater pillar  strd3
3  coming and going  strd4
4               NaN  strd5
5           No Data  strd6
6           calling  strd7
7               NaN  strd8

Upvotes: 3

how to eliminate 3 letter words or 4 letter words from a column of a dataframe

Answers (5)

Related Questions