Reputation: 1008
I have a dataframe as below:
import pandas as pd
import dask.dataframe as dd
a = {'b':['category','categorical','cater pillar','coming and going','bat','No Data','calling','cal'],
'c':['strd1','strd2','strd3', 'strd4','strd5','strd6','strd7', 'strd8']
}
df11 = pd.DataFrame(a,index=['x1','x2','x3','x4','x5','x6','x7','x8'])
I wanted to remove words whose length of each value is three. I expect results to be like:
b c
category strd1
categorical strd2
cater pillar strd3
coming and going strd4
NaN strd5
No Data strd6
calling strd7
NaN strd8
Upvotes: 1
Views: 350
Reputation: 136
Something like:
for i, ele in enumerate(df11['b']):
if len(ele) == 3:
df11['b'][i] = np.nan
Upvotes: 0
Reputation: 154
You could use a where conditional:
df11['b'] = df11['b'].where(df11.b.map(len) != 3, np.nan)
Upvotes: 0
Reputation: 323226
Maybe check mask
df11.b.mask(df11.b.str.len()<=3,inplace=True)
df11
Out[16]:
b c
x1 category strd1
x2 categorical strd2
x3 cater pillar strd3
x4 coming and going strd4
x5 NaN strd5
x6 No Data strd6
x7 calling strd7
x8 NaN strd8
Upvotes: 2
Reputation: 75080
Use series.str.len()
to identify the length of the string in a series and then compare with series.eq()
, then using df.loc[]
you can assign the values of b
as np.nan
where the condition matches:
df11.loc[df11.b.str.len().eq(3),'b']=np.nan
b c
x1 category strd1
x2 categorical strd2
x3 cater pillar strd3
x4 coming and going strd4
x5 NaN strd5
x6 No Data strd6
x7 calling strd7
x8 NaN strd8
Upvotes: 4
Reputation: 42896
Use str.len
to get the length of each string and then conditionally replace them toNaN
with np.where
if the length is equal to 3:
df11['b'] = np.where(df11['b'].str.len().eq(3), np.NaN, df11['b'])
b c
0 category strd1
1 categorical strd2
2 cater pillar strd3
3 coming and going strd4
4 NaN strd5
5 No Data strd6
6 calling strd7
7 NaN strd8
Upvotes: 3