CaptClyt10
CaptClyt10

Reputation: 13

Python 3.9: For loop is not producing output files eventhough no errors are displayed

everyone, I am fairly new to using python for data analysis,so apologies for silly questions:

IDE : PyCharm

What I have : A massive .xyz file (with 4 columns) which is a combination of several datasets, each dataset can be determined by the third column of the file which goes from 10,000 to -10,000 with 0 in between and 100 as spacing and repeats (so every 201 rows is one dataset)

What I want to do : Split the massive file into its individual datasets (201 rows each)and save each file under a different name.

What I have done so far :

# Import packages

import os
import pandas as pd
import numpy as np #For next steps
import math #For next steps

#Check and Change directory

path = 'C:/Clayton/lines/profiles_aufmod'
os.chdir(path)
print(os.getcwd()) #Correct path is printed

# split the xyz file into different files for each profile

main_xyz = 'bathy_SPO_1984_50x50_profile.xyz'

number_lines = sum(1 for row in (open(main_xyz)))
print(number_lines) # 10854 is the output
rowsize = 201

for i in range(number_lines, rowsize):
    profile_raw_df = pd.read_csv(main_xyz, delimiter=',', header=None, nrows=rowsize,
                                 skiprows=i)
    out_xyz = 'Profile' + str(i) + '.xyz'
    profile_raw_df.to_csv(out_xyz, index=False,
                          header=False, mode='a')

Problems I am facing :

What I tried to fix the issue :

Upvotes: 1

Views: 216

Answers (1)

Dirk Roorda
Dirk Roorda

Reputation: 111

While counting the number of rows in

number_lines = sum(1 for row in (open(main_xyz)))

you have exhausted the iterator that loops over the lines of the file. But you do not close the file. But this should not prevent Pandas from reading the same file.

A better idiom would be

with open(main_xyz) as fh:
  number_lines = sum(1 for row in fh)

Your for loop as it stands does not do what you probably want. I guess you want:

for i in range(0, number_lines, rowsize):

so, rowsize is the step-size, instead of the end value of the for loop.

If you want to number the output files by data set, keep a counnt of the dataset, like this


data_set = 0
for i in range(0, number_lines, rowsize):
    data_set += 1
    ...
    out_xyz = f"Profile{data_set}.xyz"
    ...
    

Upvotes: 1

Related Questions