ocean800
ocean800

Reputation: 3727

How to remove a line from a csv if it contains a certain word?

I have a CSV file that looks something like this:

    2014-6-06 08:03:19, 439105, 1053224, Front Entrance
    2014-6-06 09:43:21, 439105, 1696241, Main Exit
    2014-6-06 10:01:54, 1836139, 1593258, Back Archway
    2014-6-06 11:34:26, 845646, external, Exit 
    2014-6-06 04:45:13, 1464748, 439105, Side Exit

I was wondering how to delete a line if it includes the word "external"?

I saw another post on SO that addressed a very similar issue, but I don't understand completely...

I tried to use something like this (as explained in the linked post):

TXT_file = 'whatYouWantRemoved.txt'
CSV_file = 'comm-data-Fri.csv'
OUT_file = 'OUTPUT.csv'

## From the TXT, create a list of domains you do not want to include in output
with open(TXT_file, 'r') as txt:
    domain_to_be_removed_list = []

## for each domain in the TXT
## remove the return character at the end of line
## and add the domain to list domains-to-be-removed list
for domain in txt:
    domain = domain.rstrip()
    domain_to_be_removed_list.append(domain)


with open(OUT_file, 'w') as outfile:
    with open(CSV_file, 'r') as csv:

        ## for each line in csv
        ## extract the csv domain
        for line in csv:
            csv_domain = line.split(',')[0]

            ## if csv domain is not in domains-to-be-removed list,
            ## then write that to outfile
            if (csv_domain not in domain_to_be_removed_list):
                outfile.write(line)

The text file just held the one word "external" but it didn't work.... and I don't understand why.

What happens is that the program will run, and the output.txt will be generated, but nothing will change, and no lines with "external" are taken out.

I'm using Windows and python 3.4 if it makes a difference.

Sorry if this seems like a really simple question, but I'm new to python and any help in this area would be greatly appreciated, thanks!!

Upvotes: 2

Views: 3220

Answers (3)

David
David

Reputation: 6571

Redirect output to a new file. It will give you every line, except those that contain "external"

import sys
import re

f = open('sum.csv', "r")
lines = f.readlines()

p = re.compile('external')

for line in lines:
    if(p.search(line)):
        continue
else:
    sys.stdout.write(line)

Upvotes: 3

rmalchow
rmalchow

Reputation: 2769

if you can go with something else then python, grep would work like this:

grep file.csv "some regex" > newfile.csv

would give you ONLY the lines that match the regex, while:

grep -v file.csv "some regex" > newfile.csv 

gives everything BUT the lines matching the regex

Upvotes: 2

Matt Dodge
Matt Dodge

Reputation: 11142

It looks like you are grabbing the first element after you split the line. That is going to give you the date, according to your example CSV file.

What you probably want instead (again, assuming the example is the way it will always work) is to grab the 3rd element, so something like this:

csv_domain = line.split(',')[2]

But, like one of the comments said, this isn't necessarily fool proof. You are assuming none of the individual cells will have commas. Based on your example that might be a safe assumption, but in general when working with CSV files I recommend working with the Python csv module.

Upvotes: 2

Related Questions