Reputation: 967
I am a beginner in python. I have some problem with loop glob.glob
and np.arrange
loop.
I have a hundred CSV files looks like this:
13oct_speed_1kmh.csv
13oct_speed_2kmh.csv
and others
The structure data of all files look like this:
Distance ID
2.14 A
82.12 B
12.45 A
21.07 B
11.42 A
I want to eliminate the distance based on the buffer zone:
np.arange(10,100,30)
array([10, 40, 70])
I used this code:
def buffer (value, threshold):
return (value < threshold)
files = glob.glob("13oct_speed_*.csv")
for f in files:
df = pd.read_csv(f)
for i in np.arange(10,100,30):
threshold = i
result_df = df[buffer(df["Distance"], threshold)]
csvFileName = f + 'Buffer_' + str(threshold) + ".csv"
result_df.to_csv(csvFileName, sep=",")
but the result is very weird because the loop never stops (always saving the new file).
My desire output is every distance column file eliminated based on buffer threshold.
My expected output looks like this:
13oct_speed_1kmh_buffer10.csv
13oct_speed_1kmh_buffer40.csv
13oct_speed_1kmh_buffer70.csv
13oct_speed_2kmh_buffer10.csv
13oct_speed_2kmh_buffer40.csv
13oct_speed_2kmh_buffer70.csv
how to fix it? thank you
Upvotes: 2
Views: 35
Reputation: 863146
You can omit helper function and change csvFileName
with format
for expected output, filename with extension is returned by os.path.splitext
:
import os
files = glob.glob("csv/13oct_speed_*.csv")
for f in files:
df = pd.read_csv(f)
for threshold in np.arange(10,100,30):
result_df = df[df["Distance"] < threshold]
name, extension = os.path.splitext(f)
csvFileName = "{}_Buffer{}{}".format(name, threshold, extension)
print (csvFileName)
result_df.to_csv(csvFileName)
Upvotes: 2