Someone_1313
Someone_1313

Reputation: 432

How filter multiple column with the same condition?

I'm trying to manipulate a DataFrame with 8 columns and 263.000 rows.

This is how to look my DF:

ID1          ID2         dN          dS          t           Label_ID1   Label_ID2   Group
QJY77946     NP_073551   0.0241      0.1402      0.1479      229E-CoV    229E-CoV    Intra
QJY77954     NP_073551   0.0119      0.0912      0.0870      229E-CoV    229E-CoV    Intra
QJY77954     QJY77946    0.0119      0.0439      0.0566      229E-CoV    229E-CoV    Intra
QJY77962     NP_073551   0.0119      0.0912      0.0870      229E-CoV    229E-CoV    Intra
QJY77962     QJY77946    0.0119      0.0439      0.0566      229E-CoV    229E-CoV    Intra

My goal is filter all the values <= 6 in the columns "dN", "dS" and "t". To make this I filter the rows when the values in any columns select (dN, dS and t) have a value <= 6.

df_1_S = pd.read_csv("S_YN00.csv",sep="\t", names=['ID1',"ID2","dN","dS","t","Label_ID1","Label_ID2","Group"])

S_greather_than = (df_1_S["dN"] < 6)

df_1_S.loc[S_greather_than]

This it works, but when I trying add more columns (dS and t):

S_greather_than = (df_1_S["dN"] < 6) & (df_1_S["dS"] < 6) & (df_1_S["t"] < 6)

df_1_S.loc[S_greather_than] 

different method: using or ( | ) 

S_greather_than = ((df_1_S["dN"] < 6) | (df_1_S["dS"] < 5) | (df_1_S["t"] < 6))

df_1_S.loc[S_greather_than] 

Happens this error:

TypeError: '<' not supported between instances of 'str' and 'int'

I understand the problem but I don´t know how filter the rows with values <= 6 at the same time.

Any idea or help is welcome.

Thank!

Upvotes: 1

Views: 38

Answers (1)

Ishwar Venugopal
Ishwar Venugopal

Reputation: 882

Change data type of column 'dS' to float as follows:

df_1_S['dS'] = df_1_S['dS'].astype(float)

The error you are getting is probably because 'dS' column is of type 'object' as mentioned in the comments.

Your code should work fine with this change.

Upvotes: 1

Related Questions