How filter multiple column with the same condition?

Question

I'm trying to manipulate a DataFrame with 8 columns and 263.000 rows.

This is how to look my DF:

ID1          ID2         dN          dS          t           Label_ID1   Label_ID2   Group
QJY77946     NP_073551   0.0241      0.1402      0.1479      229E-CoV    229E-CoV    Intra
QJY77954     NP_073551   0.0119      0.0912      0.0870      229E-CoV    229E-CoV    Intra
QJY77954     QJY77946    0.0119      0.0439      0.0566      229E-CoV    229E-CoV    Intra
QJY77962     NP_073551   0.0119      0.0912      0.0870      229E-CoV    229E-CoV    Intra
QJY77962     QJY77946    0.0119      0.0439      0.0566      229E-CoV    229E-CoV    Intra

My goal is filter all the values <= 6 in the columns "dN", "dS" and "t". To make this I filter the rows when the values in any columns select (dN, dS and t) have a value <= 6.

df_1_S = pd.read_csv("S_YN00.csv",sep="	", names=['ID1',"ID2","dN","dS","t","Label_ID1","Label_ID2","Group"])

S_greather_than = (df_1_S["dN"] < 6)

df_1_S.loc[S_greather_than]

This it works, but when I trying add more columns (dS and t):

S_greather_than = (df_1_S["dN"] < 6) & (df_1_S["dS"] < 6) & (df_1_S["t"] < 6)

df_1_S.loc[S_greather_than] 

different method: using or ( | ) 

S_greather_than = ((df_1_S["dN"] < 6) | (df_1_S["dS"] < 5) | (df_1_S["t"] < 6))

df_1_S.loc[S_greather_than]

Happens this error:

TypeError: '<' not supported between instances of 'str' and 'int'

I understand the problem but I don´t know how filter the rows with values <= 6 at the same time.

Any idea or help is welcome.

Thank!

Ishwar Venugopal · Accepted Answer

Change data type of column 'dS' to float as follows:

df_1_S['dS'] = df_1_S['dS'].astype(float)

The error you are getting is probably because 'dS' column is of type 'object' as mentioned in the comments.

Your code should work fine with this change.

How filter multiple column with the same condition?

Answers (1)

Related Questions