Reputation: 432
I'm trying to manipulate a DataFrame with 8 columns and 263.000 rows.
This is how to look my DF:
ID1 ID2 dN dS t Label_ID1 Label_ID2 Group
QJY77946 NP_073551 0.0241 0.1402 0.1479 229E-CoV 229E-CoV Intra
QJY77954 NP_073551 0.0119 0.0912 0.0870 229E-CoV 229E-CoV Intra
QJY77954 QJY77946 0.0119 0.0439 0.0566 229E-CoV 229E-CoV Intra
QJY77962 NP_073551 0.0119 0.0912 0.0870 229E-CoV 229E-CoV Intra
QJY77962 QJY77946 0.0119 0.0439 0.0566 229E-CoV 229E-CoV Intra
My goal is filter all the values <= 6 in the columns "dN", "dS" and "t". To make this I filter the rows when the values in any columns select (dN, dS and t) have a value <= 6.
df_1_S = pd.read_csv("S_YN00.csv",sep="\t", names=['ID1',"ID2","dN","dS","t","Label_ID1","Label_ID2","Group"])
S_greather_than = (df_1_S["dN"] < 6)
df_1_S.loc[S_greather_than]
This it works, but when I trying add more columns (dS and t):
S_greather_than = (df_1_S["dN"] < 6) & (df_1_S["dS"] < 6) & (df_1_S["t"] < 6)
df_1_S.loc[S_greather_than]
different method: using or ( | )
S_greather_than = ((df_1_S["dN"] < 6) | (df_1_S["dS"] < 5) | (df_1_S["t"] < 6))
df_1_S.loc[S_greather_than]
Happens this error:
TypeError: '<' not supported between instances of 'str' and 'int'
I understand the problem but I don´t know how filter the rows with values <= 6 at the same time.
Any idea or help is welcome.
Thank!
Upvotes: 1
Views: 38
Reputation: 882
Change data type of column 'dS' to float as follows:
df_1_S['dS'] = df_1_S['dS'].astype(float)
The error you are getting is probably because 'dS' column is of type 'object' as mentioned in the comments.
Your code should work fine with this change.
Upvotes: 1