Why is for loop and if statement not working correctly for two lists in python?

Question

I have some data to read and I don't know why my code is not working. I want to read multiple netcdf files and store them in a ds. Here my simplified input (I have ~400 entries in my lists names and file_paths, not 4):

names = ['a','b','c','d']
file_paths = ['file_path_a','file_path_b',file_path_c',file_path_d']
number_of_measurements = []

for a, b in zip(names, file_paths):
    a = xr.open_dataset(b, group='data')
    number_of_measurements.append(len(a.datetime))

I want to remove the files_paths and the entries in the number_of_measurements when len(a.datetime) == 0 (this means that no measurements where done). I need the number_of_measurements to create a counter for an other function.

I tried this:

for i, j in zip(number_of_measurements, file_paths):
    if i == 0:
       file_paths.remove(j)
       number_of_measurements.remove(i)

This seems to work but not always, before this for loop I had number_of_measurements.count(0) = 33 and after the for loop I had number_of_measurements = 4. When I did the for loop twice I had the wanted number_of_measurements.count(0) = 0. Why is it not working correctly for the first time?

I also tried this within the first for loop, but it is not working either and I don't know why:

for a, b in zip(names, file_paths):
    a = xr.open_dataset(b, group='data')
    if len(a.datetime) == 0:
        file_paths.remove(b)
    else:
        number_of_measurements.append(len(a.datetime))

Jacobi · Accepted Answer

It is generally not advisable to modify an sequence when iterating over itself. For example:

li = [1, 2, 3]
for x in li:
    l.remove(x)

One might expect li is now an empty list. However,

>>> li
>>> [2]

This is due to how list iterator works. In the first iteration of the for loop, x is assigned the first element of the list 1. Then 1 is removed from the list. li now holds [2, 3]. When the for loop moves on to the second iteration, x is assigned the second element of the list. But the second element of the list is 3 now. Hence 2 is never removed from the list.

For your purpose, you can build a new list instead of modifying the original though.

number_of_measurements = []
file_paths_new = []
for b in file_paths:
    x = len(xr.open_dataset(b, group='data').datetime)
    if x != 0:
        number_of_measurements.append(x)
        file_paths_new.append(x)
file_paths = file_paths_new

Why is for loop and if statement not working correctly for two lists in python?

Answers (1)

Related Questions