Reputation: 15
I have a CSV file with, let's say, 16000 rows. I need to split it up in two separate files, but also need an overlap in the files of about 360 rows, so row 1-8360 in one file and row 8000-16000 in the other. Or 1-8000 and 7640-16000.
CSV file look like this:
Value X Y Z
4.5234 -46.29753186 -440.4915915 -6291.285393
4.5261 -30.89639381 -441.8390165 -6291.285393
4.5289 -15.45761327 -442.6481287 -6291.285393
4.5318 0 -442.9179423 -6291.285393
I have used this code in Python 3 to split the file, but I'm unable to get the overlap I want:
with open('myfile.csv', 'r') as f:
csvfile = f.readlines()
linesPerFile = 8000
filename = 1
for i in range(0,len(csvfile),linesPerFile+):
with open(str(filename) + '.csv', 'w+') as f:
if filename > 1: # this is the second or later file, we need to write the
f.write(csvfile[0]) # header again if 2nd.... file
f.writelines(csvfile[i:i+linesPerFile])
filename += 1
And tried to modify it like this:
for i in range(0,len(csvfile),linesPerFile+360):
and
f.writelines(csvfile[360-i:i+linesPerFile])
but I haven't been able to make it work.
Upvotes: 0
Views: 323
Reputation: 413
It's very easy with Pandas CSV and iloc.
import pandas as pd
# df = pd.read_csv('source_file.csv')
df = pd.DataFrame(data=pd.np.random.randn(16000, 5))
df.iloc[:8360].to_csv('file_1.csv')
df.iloc[8000:].to_csv('file_2.csv')
Upvotes: 1
Reputation: 2200
Hope you have got a more elegant answer using Pandas. You could consider below if don't like to install modules.
def write_files(input_file, file1, file2, file1_end_line_no, file2_end_line_no):
# Open all 3 file handles
with open(input_file) as csv_in, open(file1, 'w') as ff, open(file2, 'w') as sf:
# Process headers
header = next(csv_in)
header = ','.join(header.split())
ff.write(header + '\n')
sf.write(header + '\n')
for index, line in enumerate(csv_in):
line_content = ','.join(line.split()) # 4.5234 -46.29753186 -440.4915915 -6291.285393 => 4.5234,-46.29753186,-440.4915915,-6291.285393
if index <= file1_end_line_no: # Check if index is less than or equals first file's max index
ff.write(line_content + '\n')
if index >= file2_end_line_no: # Check if index is greater than or equals second file's max index
sf.write(line_content + '\n')
Sample Run:
if __name__ == '__main__':
in_file = 'csvfile.csv'
write_files(
in_file,
'1.txt',
'2.txt',
2,
2
)
Upvotes: 1
Reputation: 714
What about this?
for i in range(0,len(csvfile),linesPerFile+):
init = i
with open(str(filename) + '.csv', 'w+') as f:
if filename > 1: # this is the second or later file, we need to write the
f.write(csvfile[0]) # header again if 2nd.... file
init = i - 360
f.writelines(csvfile[init:i+linesPerFile+1])
filename += 1
Is this what you are looking for? Please upload a test file if it doesn't so we can provide a better answer :-)
Upvotes: 0