Remove values based on character length separated by whitespace

Question

Assume this DataFrame:

df = pd.DataFrame({'Col1':['1 123456 789012','654321','123 123457', '123458 123459']})


     Col1
0   1 123456 789012
1   654321
2   123 123457
3   123458 123459

I essentially want to remove everything that is not 6 characters separated by whitespace. I am looking for this output:

     Col1
0   123456 789012
1   654321
2   123457
3   123458 123459

Ultimately, I am looking for this output, but perhaps that is a different question:

I believe I can accomplish the latter by df.str.split(expand=True) but I have not tested. Any advice is greatly appreciated. I am looking for any direction as I do not know where to begin. I have tried df.str.replace() but the possibilities of what needs to be replaced is unknown.

BENY · Accepted Answer

Using str.split , then we using stack to change the wide to long andstr.len to filter your df

s=df.Col1.str.split(expand=True)
s.stack()[s.stack().str.len()==6].to_frame('col1')
Out[516]: 
       col1
0 1  123456
  2  789012
1 0  654321
2 1  123457
3 0  123458
  1  123459

Remove values based on character length separated by whitespace

Answers (2)

Related Questions