AMIT BISHT
AMIT BISHT

Reputation: 59

Remove rows from column on whitespaces

I want to keep all the rows in column, which has single word and rest of the rows which contains more than one white space to be removed.

My dataframe df is:

df['drug']
gilenya
fingolimod
ocrevus
dont want in the column
remove this drug row
text mining for drug column

I want to create a new dataframe with only correct drug words and removing the garbage. I have tried below solutions, but it gives me a blank/empty column of drug.

df_drug = pd.DataFrame(columns = ['drug'])
df_drug = df_drug[df_drug.drug.str.count(' ')==1]
or, df_drug = df_drug[df_drug.drug.str.contains('')]

Could you please help me to get the correct solution? Like:

df_drug.head()
drug
gilenya
fingolimod
ocrevus

Upvotes: 1

Views: 1173

Answers (1)

eva-vw
eva-vw

Reputation: 670

You can use a lambda function to return a boolean series that is only True where df_drug['drug'] is one word, and then select from df_drug using that series.

df_drug = df_drug[df_drug['drug'].apply(lambda x: True if len(x.split()) == 1 else False)]

If you have nans on the drug column, you might need to add something like str(x) to that lambda function.

Upvotes: 1

Related Questions