user3782816
user3782816

Reputation: 171

How to delete rows from .csv files

I need to remove rows from my .csv file in order to compare files for changes day over day ideally using python. I need to delete the first 3 rows as well as the line which begins with "Not Classified". I wrote an excel macro that does exactly that, but I have close to 1000 files that need modifying and the fairly simple script was taking over 1 hr to complete (mostly due to saving each file) and so I am looking for something more efficient or at the bare minimum something that doesn't preclude me from using excel while the script runs.

Here is basically my file,

Date MM/DD/YYYY,,,,,,,
Start Time XX:XX,,,,,,,
Completed YY:YY,,,,,,,
A,b,c,d,e,f
g,h,i,j,k,l
1,2,3,4,5,6
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,
Not Classified,,,,,,,
,,,,,,,,,,,

My output should simply look like

A,b,c,d,e,f
g,h,i,j,k,l
1,2,3,4,5,6
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,

Thanks in advance

Upvotes: 3

Views: 694

Answers (3)

Pedro Lobito
Pedro Lobito

Reputation: 98921

You can use something like:

import glob
from os.path import basename, dirname
for file in glob.glob( "/path/to/csvs/**.csv"): # ** = recursive
    d = dirname(file) # dir
    fn = basename(file) # filename
    with open(file) as f, open(f"{d}/new_{fn}", "w+") as f2:
        [f2.write(x) for x in list(f) if x[1] == ","]

Output from your example:

A,b,c,d,e,f
g,h,i,j,k,l
1,2,3,4,5,6
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,

The above code will generate new filtered csv files - prefixed by new_ - of every csv file in /path/to/csvs/ and subdirs.

Upvotes: 1

modesitt
modesitt

Reputation: 7210

This should not be a difficult thing to do in python and should be faster than your macro [and probably simpler ;)]. see the following: we remove the first 3 lines and remove all "Non Classified" lines and then write it back to a new file.

FILENAME = './the.csv'

def your_operation(path):

    with open(path) as f:
        lines = f.readlines()

    if len(lines) > 3:
        lines = lines[3:]
    lines = list(filter(lambda x: not x.startswith('Not Classified'), lines))       

    with open(f'{path.replace(".csv", "")}-modified.csv', 'w') as f:
        f.writelines(lines)

your_operation(FILENAME)

Note this is using f-strings avaliable in 3.6 and up. You can replace that line with

new_path = path.replace('.csv', '') + '-modified.csv'
with open(new_path, 'w') as f:
    ...

if you are using an older version. You can extend this to perform this operation on all files in a directory instead - which seems to be your desired goal. You can also simply write it back to the same file if you do not want the old contents and trust me enough. However, I am not sure how Excel handles writes by other applications - but writing to new files will definitely let you use excel in the meantime.

import glob

root = "path/to/dir/**.csv"  # recursive search in dir

for path in glob.glob(root):
    your_operation(path)

Upvotes: 4

olivaw
olivaw

Reputation: 373

For a CSV file named "file.csv", you can run these two Python lines:

with open("file.csv", "r") as f:
    lines = [line for line in f.readlines()[3:] if not line.startswith("Not Classified")]
with open("new-file.csv", "w") as f:
    f.writelines(lines)

Upvotes: 1

Related Questions