nbn
nbn

Reputation: 195

Python run script on several csv files

I'm trying to run my script on several .csv files and output the results from each file. A snippet of my code is as follows-

import sys
import os
import logging
import subprocess
import argparse
import pandas as pd
import glob


files = glob.glob('/scratch/*/*.csv')

for file in files:
        df = pd.read_csv(file,delimiter = ',',skiprows=range(1,11))

#do some calculation on each file


#calculate the final value
metric = (max(max(dif_r1a),max(dif_r1c),max(dif_r1g),max(dif_r1t),max(dif_r2a),max(dif_r2c),max(dif_r2g),max(dif_r2t)))

#output the final value for each csv file
print(os.path.basename(file) + ' ' + str(metric))

The output I get is only for a single csv file

file1.csv 0.25

How do I iterate this to output the value for all the csv files ?

Thank you

Upvotes: 0

Views: 53

Answers (1)

Mitchnoff
Mitchnoff

Reputation: 515

From what it appears like in your code above you create a dataframe for each .csv file, but only calculate the final value and print after the for loop executes. If you were to want to do it for each dataframe, these would need to be in the for loop:

import sys
import os
import logging
import subprocess
import argparse
import pandas as pd
import glob


files = glob.glob('/scratch/*/*.csv')

for file in files:
        df = pd.read_csv(file,delimiter = ',',skiprows=range(1,11))

#do some calculation on each file


#calculate the final value
metric = (max(max(dif_r1a),max(dif_r1c),max(dif_r1g),max(dif_r1t),max(dif_r2a),max(dif_r2c),max(dif_r2g),max(dif_r2t)))

#output the final value for each csv file
print(os.path.basename(file) + ' ' + str(metric))

This is what you have at the moment, but you would want to change it to:

import sys
import os
import logging
import subprocess
import argparse
import pandas as pd
import glob


files = glob.glob('/scratch/*/*.csv')

for file in files:
    df = pd.read_csv(file,delimiter = ',',skiprows=range(1,11))

    #do some calculation on each file


    #calculate the final value
    metric = 
    (max(max(dif_r1a),max(dif_r1c),max(dif_r1g),max(dif_r1t), \
    max(dif_r2a),max(dif_r2c),max(dif_r2g),max(dif_r2t)))

    #output the final value for each csv file
    print(os.path.basename(file) + ' ' + str(metric))

However this could also be due to formatting on the comment.

Upvotes: 1

Related Questions