Justin
Justin

Reputation: 87

Removing unwanted row in a CSV file

I'm new in Python and I have a problem of removing unwanted rows in a csv file. For instance I have 3 columns and a lot of rows:

A      B     C
hi     1.0   5
hello  2.0   6
ok     3.0   7

I loaded the data using numpy (instead of csv)

import numpy as np
a= np.loadtxt('data.csv' , delimiter= ',' , skiprows= 1)

I want to introduce a range for the 2nd column

b=np.arange(0, 2.1,0.1) 

I don't have any idea how I should use that piece of code.

What I want as a final output is the following:

A      B     C
hi     1.0   5
hello  2.0   6

The last row would be remove since I chose a range for the 2nd column up to 2.0 only. I don't have any idea how can I accomplish this.

Upvotes: 1

Views: 258

Answers (3)

mhawke
mhawke

Reputation: 87074

You can just use the csv module. N.B the following expects that the CSV fields are comma separated, not tab separated (as your sample suggests).

import csv

with open('data.csv') as data:
    reader = csv.reader(data)    # or csv.reader(data, delimiter='\t') for tabs
    field_names = next(reader)
    filtered_rows = [row for row in reader if 0 <= float(row[1]) <= 2.0]

>>> field_names
['A', 'B', 'C']
>>> filtered_rows
[['hi', '1.0', '5'], ['hello', '2.0', '6']]
>>> filtered_rows.insert(0, field_names)
>>> filtered_rows
[['A', 'B', 'C'], ['hi', '1.0', '5'], ['hello', '2.0', '6']]

If you require that values be exact tenths within the required range, then you can do this:

import csv
import numpy as np

allowed_values = np.arange(0, 2.1, 0.1)
with open('data.csv') as data:
    reader = csv.reader(data)
    field_names = next(reader)
    filtered_rows = [row for row in reader if float(row[1]) in allowed_values]

Edit after updated requirements

With extra constraints on column "C", e.g. value must be >= 6.

import csv
import numpy as np

allowed_values_B = np.arange(0, 2.1, 0.1)
def accept_row(row):
    return (float(row[1]) in allowed_values_B) and (int(row[2]) >= 6)

with open('data.csv') as data:
    reader = csv.reader(data)
    field_names = next(reader)
    filtered_rows = [row for row in reader if accept_row(row)]

>>> filtered_rows
[['hello', '2.0', '6']]

Upvotes: 1

jrjc
jrjc

Reputation: 21873

Try with Pandas:

import pandas as pd

a = pd.read_csv('data.csv', index_col=0) # column A will be the index.
a

       B  C
A          
hi     1  5
hello  2  6
ok     3  7

For every value of B up to 2 :

a[a.B <= 2]

       B  C
A          
hi     1  5
hello  2  6

Details :

a.B

A
hi       1
hello    2
ok       3
Name: B, dtype: float64

a.B <= 2

A
hi        True
hello     True
ok       False
Name: B, dtype: bool

Upvotes: 2

blue note
blue note

Reputation: 29081

You can do it using logical indexing

 index = (x[:, 1] <= 2.0)

Then

 x = x[index]

selecting only the lines that satisfy this condition

Upvotes: 1

Related Questions