Python 3.9: For loop is not producing output files eventhough no errors are displayed

Question

everyone, I am fairly new to using python for data analysis,so apologies for silly questions:

IDE : PyCharm

What I have : A massive .xyz file (with 4 columns) which is a combination of several datasets, each dataset can be determined by the third column of the file which goes from 10,000 to -10,000 with 0 in between and 100 as spacing and repeats (so every 201 rows is one dataset)

What I want to do : Split the massive file into its individual datasets (201 rows each)and save each file under a different name.

What I have done so far :

# Import packages

import os
import pandas as pd
import numpy as np #For next steps
import math #For next steps

#Check and Change directory

path = 'C:/Clayton/lines/profiles_aufmod'
os.chdir(path)
print(os.getcwd()) #Correct path is printed

# split the xyz file into different files for each profile

main_xyz = 'bathy_SPO_1984_50x50_profile.xyz'

number_lines = sum(1 for row in (open(main_xyz)))
print(number_lines) # 10854 is the output
rowsize = 201

for i in range(number_lines, rowsize):
    profile_raw_df = pd.read_csv(main_xyz, delimiter=',', header=None, nrows=rowsize,
                                 skiprows=i)
    out_xyz = 'Profile' + str(i) + '.xyz'
    profile_raw_df.to_csv(out_xyz, index=False,
                          header=False, mode='a')

Problems I am facing :

The for loop was at first giving output files as seen in the image,check Proof of output but now it does not produce any outputs and it is not rewriting the previous files either. The other mystery is that I am not getting an error either,check Code executed without error.

What I tried to fix the issue :

I updated all the packages and restarted Pycharm
I ran each line of code one by one and everything works until the for loop

Dirk Roorda · Accepted Answer

While counting the number of rows in

number_lines = sum(1 for row in (open(main_xyz)))

you have exhausted the iterator that loops over the lines of the file. But you do not close the file. But this should not prevent Pandas from reading the same file.

A better idiom would be

with open(main_xyz) as fh:
  number_lines = sum(1 for row in fh)

Your for loop as it stands does not do what you probably want. I guess you want:

for i in range(0, number_lines, rowsize):

so, rowsize is the step-size, instead of the end value of the for loop.

If you want to number the output files by data set, keep a counnt of the dataset, like this


data_set = 0
for i in range(0, number_lines, rowsize):
    data_set += 1
    ...
    out_xyz = f"Profile{data_set}.xyz"
    ...

Python 3.9: For loop is not producing output files eventhough no errors are displayed

Answers (1)

Related Questions