Mike Steve
Mike Steve

Reputation: 21

Create a loop to process multiple files

I have written the code below but currently I need to retype the same conditions for each file and, as there are over 100 files, this is not ideal.

I couldn't come up with a way to implement this using a loop that will read all of these files and filter the values in MP out. Meanwhile, adding two new columns to each filter file as the written code below would be the only method I know so far. I try to obtain a new combined data frame with all filter files with their conditions

Please suggest ways of implementing this using a loop:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal

df1 = pd.read_csv(r'E:\Unmanned Cars\Unmanned Cars\2017040810_052.csv')
df2 = pd.read_csv(r'E:\Unmanned Cars\Unmanned Cars\2017040901_052.csv')
df3 = pd.read_csv(r'E:\Unmanned Cars\Unmanned Cars\2017040902_052.csv')

df1 =df1["MP"].unique()
df1=pd.DataFrame(df1, columns=['MP'])
df1["Dates"] = "2017-04-08"
df1["Inspection"] = "10"
##
df2 =df2["MP"].unique()
df2=pd.DataFrame(df2, columns=['MP'])
df2["Dates"] = "2017-04-09"
df2["Inspection"] = "01"
##
df3 =df3["MP"].unique()
df3=pd.DataFrame(df3, columns=['MP'])
df3["Dates"] = "2017-04-09"
df3["Inspection"] = "02"
Final = pd.concat([df1,df2,df3,df4],axis = 0, sort = False)

Upvotes: 2

Views: 271

Answers (1)

Ictus
Ictus

Reputation: 1557

Maybe this sample code will help you.

#!/usr/bin/env python3

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal
from os import path
import glob
import re

def process_file(file_path):
    result = None
    file_path = file_path.replace("\\","/")
    filename = path.basename(file_path)
    regex = re.compile("^(\\d{4})(\\d{2})(\\d{2})(\\d{2})")
    match = regex.match(filename)
    if match:
        date = "%s-%s-%s" % (match[1] , match[2] , match[3])
        inspection = match[4]

        df1 = pd.read_csv(file_path)
        df1 =df1["MP"].unique()
        df1=pd.DataFrame(df1, columns=['MP'])
        df1["Dates"] = date
        df1["Inspection"] = inspection
        result = df1
    return result


def main():
#    files_list = [
#        r'E:\Unmanned Cars\Unmanned Cars\2017040810_052.csv',
#        r'E:\Unmanned Cars\Unmanned Cars\2017040901_052.csv',
#        r'E:\Unmanned Cars\Unmanned Cars\2017040902_052.csv'
#    ]
    directory = 'E:\\Unmanned Cars\\Unmanned Cars\\'
    files_list =  [f for f in glob.glob(directory + "*_052.csv")]

    result_list = [ process_file(filename) for filename in files_list ]

    Final = pd.concat(result_list, axis = 0, sort = False)

if __name__ == "__main__":
    main()

I've created a process_file function for processing each file. There is used a regular expression for extracting data from filename. Also, the glob module was used for reading the files from a directory with pattern matching and expansion.

Upvotes: 2

Related Questions