Aartika Sethi
Aartika Sethi

Reputation: 39

Selectively delete rows in a csv file (Python)

I am reading in a .csv file of space-delimited data, which contains some unwanted words. I need to find if unwanted words are present in any of columns of a given row, and delete that row.

For example, if unwanted_list = ['one', 'on'], then for an input .csv file with columns name class label test;

Input:

ne two 1 five,
on one 2 we.
as we 20 on
cast as none vote

Representative output:

ne two 1 five,
cast as none vote

Upvotes: 0

Views: 3838

Answers (2)

cgte
cgte

Reputation: 450

you may have a look at the csv module documentation, https://docs.python.org/2/library/csv.html

Here is an example code in ipython.

In [1]: import csv

In [2]: f = open('plop.csv')

In [3]: exclude = set(('on', 'one'))

In [4]: reader = csv.reader(f, delimiter=' ')

In [5]: for row in reader:
   ...:     if any(val in exclude for val in row):
   ...:         continue
   ...:     else:
   ...:         print row
   ...:         
['name', 'class', 'label', 'test']
['ne', 'two', '1', 'five,']
['cast', 'as', 'none', 'vote']

Feel free to adapt the script to your needs.

Take care that i did not provide special processinng for the header which can be handled this way. This is not how you should process for very large files since the whole file is read an put into ram.

In [9]: f=open('plop.csv')

In [10]: reader = csv.reader(f.readlines()[1:], delimiter=' ') #skip headers

In [11]: for row in reader:
    ...:     if any(val in exclude for val in row):
    ...:         continue
    ...:     else:
    ...:         print row
    ...:         
['ne', 'two', '1', 'five,']
['cast', 'as', 'none', 'vote']

Upvotes: 0

Tom Wyllie
Tom Wyllie

Reputation: 2085

A simple script using the python set object should do the trick. This checks that there are no words common to both the set of unwanted words, and the set of words in the line of the input file;

def filter_unwanted_words():
    unwanted_words = {'one', 'on'}
    with open('input.csv', 'r') as f:
        for line in f:
            if set(line.split()).isdisjoint(unwanted_words):
                yield line


def write_output():
    with open('output.csv', 'w') as f:
        f.writelines((line for line in filter_unwanted_words()))

if __name__ == '__main__':
    write_output()

The output in output.csv for this is;

ne two 1 five,
cast as none vote

Upvotes: 1

Related Questions