Separate csv files based on column values

Question

There are few csv files in different folders and sub folders. I need to separate each csv file to incoming and outgoing traffic.

if source == ac:37:43:9b:92:24 && Receiver address == 8c:15:c7:3a:d0:1a then those rows need to get written to .out.csv files.

if Transmitter address == 8c:15:c7:3a:d0:1a && Destination== ac:37:43:9b:92:24 then those rows need to get written into .in.csv files.

The output files (files that got separated as incoming and outgoing) have to get the same name as input files (eg: if input file is aaa.csv then output files will be aaa.in.csv and aaa.out.csv).

And output files needs to get written into folders and sub folders as input files were. I tried the below code, but not working. I am new to programming, so not sure is this code correct or wrong. Any help is greatly appreciated. Thanks

import csv
import os
import subprocess

startdir = '.'   
outdir = '.'
suffix = '.csv'

def decode_to_file(cmd, in_file, new_suffix):
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    fileName = outdir + '/' + in_file[len(startdir):-len(suffix)] + new_suffix
    os.makedirs(os.path.dirname(fileName), exist_ok=True)
    csv_writer = csv.writer(open(fileName, 'w'))
    for line_bytes in proc.stdout:
        line_str = line_bytes.decode('utf-8')
        csv_writer.writerow(line_str.strip().split(','))

for root, dirs, files in os.walk(startdir):
    for name in files:
        if not name.endswith(suffix):
            continue
        in_file = os.path.join(root, name)

        decode_to_file(
            cmd= [if source== ac:37:43:9b:92:24 && Receiver address== 8c:15:c7:3a:d0:1a],
            in_file=in_file,
            new_suffix='.out.csv'
        )
        decode_to_file(
            cmd= [if Transmitter address == 8c:15:c7:3a:d0:1a && Destination== ac:37:43:9b:92:24],
            in_file=in_file,
            new_suffix='.in.csv'
        )

Martin Evans · Accepted Answer

You could make use of Python's CSV library to process the rows and glob.glob could be used to walk over the files. os.path.splitext() can be used to help with changing the file extension. For example:

import csv
import glob
import os

for filename in glob.glob('**/*.csv', recursive=True):
    basename, extension = os.path.splitext(filename)
    print(f"Processing - {filename}")

    with open(filename, encoding='utf-8') as f_input, \
        open(basename + '.in.csv', 'w', newline='', encoding='utf-8') as f_in, \
        open(basename + '.out.csv', 'w', newline='', encoding='utf-8') as f_out:

        csv_input = csv.reader(f_input)
        csv_in = csv.writer(f_in)
        csv_out = csv.writer(f_out)

        for row in csv_input:
            if row[3] == 'ac:37:43:9b:92:24' and row[4] == '8c:15:c7:3a:d0:1a':
                csv_out.writerow(row)
            if row[5] == '8c:15:c7:3a:d0:1a' and row[6] == 'ac:37:43:9b:92:24':
                csv_in.writerow(row)

This assumes that your CSV file are in a standard format e.g. aaa,bbb,ccc,ddd. The csv.reader() will read each line of the file and convert it into a list of values automatically split on the commas. So the first value in each row is row[0].

Separate csv files based on column values

Answers (1)

Related Questions