glob Multiple CSV and np.arange

Question

I am a beginner in python. I have some problem with loop glob.glob and np.arrange loop.

I have a hundred CSV files looks like this:

13oct_speed_1kmh.csv
13oct_speed_2kmh.csv
and others

The structure data of all files look like this:

Distance ID
2.14     A
82.12    B
12.45    A
21.07    B
11.42    A

I want to eliminate the distance based on the buffer zone:

np.arange(10,100,30)
array([10, 40, 70])

I used this code:

def buffer (value, threshold):
    return (value < threshold)
files = glob.glob("13oct_speed_*.csv") 
for f in files:
    df = pd.read_csv(f)
    for i in np.arange(10,100,30):
        threshold = i
        result_df = df[buffer(df["Distance"], threshold)]
        csvFileName = f + 'Buffer_' + str(threshold) + ".csv"
        result_df.to_csv(csvFileName, sep=",")

but the result is very weird because the loop never stops (always saving the new file).

My desire output is every distance column file eliminated based on buffer threshold.

My expected output looks like this:

13oct_speed_1kmh_buffer10.csv
13oct_speed_1kmh_buffer40.csv
13oct_speed_1kmh_buffer70.csv
13oct_speed_2kmh_buffer10.csv
13oct_speed_2kmh_buffer40.csv
13oct_speed_2kmh_buffer70.csv

how to fix it? thank you

jezrael · Accepted Answer

You can omit helper function and change csvFileName with format for expected output, filename with extension is returned by os.path.splitext:

import os

files = glob.glob("csv/13oct_speed_*.csv") 
for f in files:
    df = pd.read_csv(f)
    for threshold in np.arange(10,100,30):
        result_df = df[df["Distance"] < threshold]
        name, extension = os.path.splitext(f)
        csvFileName = "{}_Buffer{}{}".format(name, threshold, extension)
        print (csvFileName)
        result_df.to_csv(csvFileName)

glob Multiple CSV and np.arange

Answers (1)

Related Questions