Reputation: 43
Someone, please review this code for me. I am kind of confused with the path in the code. By the way, the code is for splitting CSV file based on a number of rows, got it on GitHub and by using it have been trying to split a CSV file but the code is too confusing for me.
You may also follow the link for the code click to see the code
Assuming that, the name of the csv to be splitted is Dominant.csv
,
source_filepath is C:\\Users\James\\Desktop\\Work
,
dest_path is C:\\Users\James\\Desktop\\Work\\Processed
,
result_filename_prefix
is split
,
My confusion is,
Is target_filename
in the code means my csv file Dominant.csv
? and what exactly is this target_filepath
?
Could someone please reformat the code for me as per the given path and file names? Would be really thankful
import csv
import os
import sys
if len(sys.argv) != 5:
raise Exception('Wrong number of arguments!')
SOURCE_FILEPATH = sys.argv[1]
DEST_PATH = sys.argv[2]
FILENAME_PREFIX = sys.argv[3]
ROW_LIMIT = int(sys.argv[4])
def split_csv(source_filepath, dest_path, result_filename_prefix, row_limit):
"""
Split a source CSV into multiple CSVs of equal numbers of records,
except the last file.
The initial file's header row will be included as a header row in each split
file.
Split files follow a zero-index sequential naming convention like so:
`{result_filename_prefix}_0.csv`
:param source_filepath {str}:
File name (including full path) for the file to be split.
:param dest_path {str}:
Full path to the directory where the split files should be saved.
:param result_filename_prefix {str}:
File name to be used for the generated files.
Example: If `my_split_file` is provided as the prefix, then a resulting
file might be named: `my_split_file_0.csv'
:param row_limit {int}:
Number of rows per file (header row is excluded from the row count).
:return {NoneType}:
"""
if row_limit <= 0:
raise Exception('row_limit must be > 0')
with open(source_filepath, 'r') as source:
reader = csv.reader(source)
headers = next(reader)
file_number = 0
records_exist = True
while records_exist:
i = 0
target_filename = f'{result_filename_prefix}_{file_number}.csv'
target_filepath = os.path.join(dest_path, target_filename)
with open(target_filepath, 'w') as target:
writer = csv.writer(target)
while i < row_limit:
if i == 0:
writer.writerow(headers)
try:
writer.writerow(next(reader))
i += 1
except:
records_exist = False
break
if i == 0:
# we only wrote the header, so delete that file
os.remove(target_filepath)
file_number += 1
split_csv(SOURCE_FILEPATH, DEST_PATH, FILENAME_PREFIX, ROW_LIMIT)
Upvotes: 0
Views: 93
Reputation: 463
target_filename is the name you want the output file to have. target_filepath is the path to the output file including its name. In the split_csv function call: SOURCE_PATH is the path to the source file DEST_PATH is the path to the folder you want the output file in FILENAME_PREFIX is what you want the output file name(s) to start with ROW_LIMIT is the maximum number of rows per file you want written to the output file.
Upvotes: 1