Reputation: 1030

filter pandas column only for string containing rows

I end up with a pandas dataframe that looks like this after I read and filter a excel file.

Col1    Col2
afaf    abc 1
1512        
asda    cdd 2
adsd

I am trying to end up with

Col1    Col2
afaf    abc1
asda    cdd2

I tried df['Col2'].replace('',np.nan,inplace=True) and to do a dropna afterwards but nothing gets replaced, So I think that it can't replace because column2 has more than one space in those empty rows.

I forgot to mention that I can't use strip because the Col2 string has spaces that I need to keep unchanged.

Any ideas?

Upvotes: 2

Answers (3)

Quickbeam2k1

Reputation: 5437

you can strip the column using pandas str.strip() function. this should remove all the whitespace.

It could look like this

df['Col2'].str.strip().replace('',np.nan,inplace=True)

So using pipe you can receive the non nan rows as

df.iloc[df.pipe(lambda x:x['Col2'].str.strip().replace('',np.nan)).dropna().index]

This latter, updated solution, also works under your additional whitespace constraints. However, note that I used pipe before that constraint was posted.

Now, I'd choose e.g. a solutions like Jezrael's, but formulated as

df[df['Col2'].str.strip() !='']

I think, this is a little bit clearer than using the len functions

Just performed some benchmarks on a really small dataframe. PirSquared solutions is fastest, followed by Jezrael's, followed by my solution using comparison to ''. Last place is the piping variant.

Upvotes: 2

piRSquared

Reputation: 294228

use str.match

df[~df.Col2.str.match(r'^\s*$')]

Upvotes: 2

jezrael

Reputation: 862551

I think you can use boolean indexing with condition for remove some possible whitespaces by strip and then check if length by len is not 0:

print (df[df.Col2.str.strip().str.len() != 0])
   Col1   Col2
0  afaf  abc 1
2  asda  cdd 2

If no whitespaces:

df[df.Col2.str.len() != 0]

Upvotes: 5

filter pandas column only for string containing rows

Answers (3)

Related Questions