Select all the rows based on a column value pandas

Question

Hi I need to select all the rows based on a column value, either store it in a new variable or make a new dataframe and save it into csv with no headers just the info.

import pandas as pd
import numpy as np

print(df)
#      0      1  2   3
# 0  Gm#    one  0   0
# 1  922    one  1   2
# 2  933    two  2   4
# 3  952  three  3   6
# 4  Gm#    two  4   8
# 5  960    two  5  10
# 6  963    one  6  12
# 7  999  three  7  14

So I want a new data frame based on a condition of the first column. I only want to grab the rows in a range >= 900 & <=999. So desired output:

I want to store it in a csv with no indexes.

  print (df2)
  922    one  1   2
  933    two  2   4
  952  three  3   6
  960    two  5  10
  963    one  6  12
  999  three  7  14

I tried this: Problem I am getting I can't figure out how to convert a hole column into integers..or maybe there is a easier way to do it by just reference the hole data frame instead checked on various articles on stack overflow and YouTube videos but just can't get it right. Any ideas I will gladly appreciate it.

#df[x]= data[x][(data[x]['0'].astype(np.int64))] need to find a away to convert the column [0] into integer for it evaluate
#df2 = data[i]([(data['0'] >= 900) & (data['0'] <= 999)])

jezrael · Accepted Answer

You can convert to_numeric first column selected by position by iloc and then add condition (data['0'].notnull()), because not numeric values are converted to NaN. Last use to_csv with parameter index=False for removing index and header=None for removing header:

import pandas as pd

data = pd.DataFrame(
{'1': {0: 'one', 1: 'one', 2: 'two', 3: 'three', 4: 'two', 5: 'two', 6: 'one', 7: 'three'}, 
'0': {0: 'Gm', 1: '922', 2: '933', 3: '952', 4: 'Gm', 5: '960', 6: '963', 7: '999'}, 
'3': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 12, 7: 14}, 
'2': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7}})

print data

     0      1  2   3
0   Gm    one  0   0
1  922    one  1   2
2  933    two  2   4
3  952  three  3   6
4   Gm    two  4   8
5  960    two  5  10
6  963    one  6  12
7  999  three  7  14

data.iloc[:, 0] = pd.to_numeric(data.iloc[:, 0], errors='coerce')
print data
       0      1  2   3
0    NaN    one  0   0
1  922.0    one  1   2
2  933.0    two  2   4
3  952.0  three  3   6
4    NaN    two  4   8
5  960.0    two  5  10
6  963.0    one  6  12
7  999.0  three  7  14


df1 = data[(data['0'] >= 900) & (data['0'] <= 999) & (data['0'].notnull())]
print df1
       0      1  2   3
1  922.0    one  1   2
2  933.0    two  2   4
3  952.0  three  3   6
5  960.0    two  5  10
6  963.0    one  6  12
7  999.0  three  7  14


df1.to_csv('file', index=False, header=None)

EDIT by comment:

You can try:

for i in range(0, len(tables)): 
    df = tables[i]
    df.replace(regex=True,inplace=True,to_replace='½',value='.5') 
    df.iloc[:, 0] = pd.to_numeric(df.iloc[:, 0], errors='coerce') 
    df1 = df[(df.iloc[:, 0] >= 900) & (df['0'] <= 999) & (df['0'].notnull())]
    print (df1)

Select all the rows based on a column value pandas

Answers (1)

Related Questions