Sam Russo
Sam Russo

Reputation: 145

Running code through a folder of files using Pandas

So I've managed to write a code in pandas to do the data analysis I need and export to a new .xlsx file. Which is awesome, except it's for one file and I typically have 40+ files I want to run this on.

Through research I managed to at least get it to read the file names in the folder, but I am at a loss on how to implement into my existing code.

Goal: To run code over each .xlsx file in folder and spit out the analyzed data as new .xlsx files.

For now here is the code I came up with to read the folder:

import os
import glob

os.chdir('C:/Users/PCTR261010/Desktop/PartReviewExport')
FileList = glob.glob('*.xlsx')
print(FileList)

Here is a snippet of the import section of my larger code file:

import os
import glob
import pandas as pd

# Prints header information in Part Scorecard
df = pd.read_excel('GAT_US_PartReview_2017-06-23.xlsx', header=None, 
skipinitialspace=True, skiprows=1)
header = df.head(5).filter([0,2], axis=1)

# Begins Data Analysis of Part Scorecard
fields = ['Appl Req', 'Appl Count ', 'Intr Req', 'Intr Count ', 'OE Intr 
Req', 'Has OE Intr', 'Has Attr Editor',
      'Part IMG Req', 'Has Part IMG', 'Has MPCC', 'Warr Req', 'Has Warr 
TXT', 'Has Warr PDF', 'MSDS Req',
      'Has MSDS', 'UPC Req', 'Has UPC', 'Has UNSPSC', 'Valid Part']

df = pd.read_excel('GAT_US_PartReview_2017-06-23.xlsx', 
skipinitialspace=True, skiprows=7, usecols=fields,
               dtype=str)

Any help is appreciated!!

Upvotes: 0

Views: 224

Answers (2)

Sam Russo
Sam Russo

Reputation: 145

OMG! When you spend all day staring at this and finally realize it's an indention issue. FML. Thanks guys!

Upvotes: 0

James
James

Reputation: 36598

You can iterate over each of the file names, passing them to pandas

import os
import glob
import pandas as pd

os.chdir('C:/Users/PCTR261010/Desktop/PartReviewExport')
FileList = glob.glob('*.xlsx')
print(FileList)


for fname in FileList:  
    # Prints header information in Part Scorecard
    df = pd.read_excel(fname, header=None, 
    skipinitialspace=True, skiprows=1)
    header = df.head(5).filter([0,2], axis=1)

    # Begins Data Analysis of Part Scorecard
    fields = ['Appl Req', 'Appl Count ', 'Intr Req', 'Intr Count ', 'OE Intr 
    Req', 'Has OE Intr', 'Has Attr Editor',
          'Part IMG Req', 'Has Part IMG', 'Has MPCC', 'Warr Req', 'Has Warr 
    TXT', 'Has Warr PDF', 'MSDS Req',
          'Has MSDS', 'UPC Req', 'Has UPC', 'Has UNSPSC', 'Valid Part']

    df = pd.read_excel(fname, skipinitialspace=True, 
                       skiprows=7, usecols=fields, dtype=str)

    # ... analysis here ...
    df.to_excel('out_' + fname)

Upvotes: 1

Related Questions