Reputation: 65
Hi I need to select all the rows based on a column value, either store it in a new variable or make a new dataframe and save it into csv with no headers just the info.
import pandas as pd
import numpy as np
print(df)
# 0 1 2 3
# 0 Gm# one 0 0
# 1 922 one 1 2
# 2 933 two 2 4
# 3 952 three 3 6
# 4 Gm# two 4 8
# 5 960 two 5 10
# 6 963 one 6 12
# 7 999 three 7 14
So I want a new data frame based on a condition of the first column. I only want to grab the rows in a range >= 900 & <=999
. So desired output:
I want to store it in a csv with no indexes.
print (df2)
922 one 1 2
933 two 2 4
952 three 3 6
960 two 5 10
963 one 6 12
999 three 7 14
I tried this: Problem I am getting I can't figure out how to convert a hole column into integers..or maybe there is a easier way to do it by just reference the hole data frame instead checked on various articles on stack overflow and YouTube videos but just can't get it right. Any ideas I will gladly appreciate it.
#df[x]= data[x][(data[x]['0'].astype(np.int64))] need to find a away to convert the column [0] into integer for it evaluate
#df2 = data[i]([(data['0'] >= 900) & (data['0'] <= 999)])
Upvotes: 2
Views: 985
Reputation: 862671
You can convert to_numeric
first column selected by position by iloc
and then add condition (data['0'].notnull())
, because not numeric values are converted to NaN
. Last use to_csv
with parameter index=False
for removing index
and header=None
for removing header:
import pandas as pd
data = pd.DataFrame(
{'1': {0: 'one', 1: 'one', 2: 'two', 3: 'three', 4: 'two', 5: 'two', 6: 'one', 7: 'three'},
'0': {0: 'Gm', 1: '922', 2: '933', 3: '952', 4: 'Gm', 5: '960', 6: '963', 7: '999'},
'3': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 12, 7: 14},
'2': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7}})
print data
0 1 2 3
0 Gm one 0 0
1 922 one 1 2
2 933 two 2 4
3 952 three 3 6
4 Gm two 4 8
5 960 two 5 10
6 963 one 6 12
7 999 three 7 14
data.iloc[:, 0] = pd.to_numeric(data.iloc[:, 0], errors='coerce')
print data
0 1 2 3
0 NaN one 0 0
1 922.0 one 1 2
2 933.0 two 2 4
3 952.0 three 3 6
4 NaN two 4 8
5 960.0 two 5 10
6 963.0 one 6 12
7 999.0 three 7 14
df1 = data[(data['0'] >= 900) & (data['0'] <= 999) & (data['0'].notnull())]
print df1
0 1 2 3
1 922.0 one 1 2
2 933.0 two 2 4
3 952.0 three 3 6
5 960.0 two 5 10
6 963.0 one 6 12
7 999.0 three 7 14
df1.to_csv('file', index=False, header=None)
EDIT by comment:
You can try:
for i in range(0, len(tables)):
df = tables[i]
df.replace(regex=True,inplace=True,to_replace='½',value='.5')
df.iloc[:, 0] = pd.to_numeric(df.iloc[:, 0], errors='coerce')
df1 = df[(df.iloc[:, 0] >= 900) & (df['0'] <= 999) & (df['0'].notnull())]
print (df1)
Upvotes: 1