P.Harrison
P.Harrison

Reputation: 189

MS Access need to loop through 10k csv files but the MS Access database fills up too quickly

so I have 10k csv files i need to look through. Pretty I have a loop which goes through the list of reports. It imports the csv from a particular file and then does query exports the result back out moves on to the next csv however, because there are 10k csv files the database grows past its maximum 2GB is there a way to refresh the database mid loop to? Something like "Application.SetOption 'Auto compact', True" which works.

Set rs = CurrentDb.OpenRecordset("Select * From NormalReports") 'Table of reports
If Not (rs.EOF And rs.BOF) Then 'This loop goes through each normal directory and creates the winners list for directory.
    rs.MoveFirst
    Do Until rs.EOF = True
        Directory = rs!Directory
        ReportName = rs!Name

        NUMBDATASTr = Directory & "NUMBDATM.CSV"
        NICHDATMSTr = Directory & "NICHDATM.CSV"
        PRNTDATMSTr = Directory & "PRNTDATM.CSV"

        If Directory Like "E:*" Then
            CTRY = "UK"
        ElseIf Directory Like "F:*" Then
            CTRY = "FR"
        ElseIf Directory Like "G:*" Then
            CTRY = "PW"
        ElseIf Directory Like "H:*" Then
            CTRY = "ES"
        ElseIf Directory Like "I:*" Then
            CTRY = "IT"
        ElseIf Directory Like "J:*" Then
            CTRY = "AT"
        ElseIf Directory Like "K:*" Then
            CTRY = "DE"
        ElseIf Directory Like "R:*" Then
            CTRY = "RU"
        ElseIf Directory Like "N:*" Then
            CTRY = "NO"
        ElseIf Directory Like "C:*" Then
            CTRY = "UK"
        Else
            MsgBox "Invalid directory Found"
            Exit Sub
        End If


        DoCmd.SetWarnings False
        DoCmd.OpenQuery "ResetNumbDatM"
        DoCmd.OpenQuery "ResetNICHDATM"
        DoCmd.OpenQuery "ResetPRNTDATM"
        DoCmd.SetWarnings True

        'Current Issues data types of the tables conflicting make sure to change that. Issue Noted: 06/07/2018. Resolved: NOT
        Dim CombLoop As Integer
        Dim LotusCn As Object
        Dim rsLotus As Object
        Dim strSql, CombFileName, GotoRange As String
        Dim rsLotusFiles As DAO.Recordset

        Set LotusCn = CreateObject("ADODB.Connection")
        Set rsLotus = CreateObject("ADODB.Recordset")

        DoCmd.SetWarnings False
        DoCmd.TransferText TransferType:=acImportDelim, TableName:="NUMBDATM", FileName:=NUMBDATASTr, HasFieldNames:=True
        DoCmd.DeleteObject acTable, "NUMBDATM_ImportErrors"
        DoCmd.TransferText TransferType:=acImportDelim, TableName:="PRNTDATM", FileName:=PRNTDATMSTr, HasFieldNames:=True
        DoCmd.DeleteObject acTable, "PRNTDATM_ImportErrors"
        DoCmd.TransferText TransferType:=acImportDelim, TableName:="NICHDATM", FileName:=NICHDATMSTr, HasFieldNames:=True
        DoCmd.DeleteObject acTable, "NICHDATM_ImportErrors"
        DoCmd.SetWarnings True

        'Save Path for First Export
        SaveFile = Directory & "AWD_" & MTH & ".csv"
        'End of Save Path First Export
        'Display Winners and create the table
        DoCmd.SetWarnings False
        DoCmd.OpenQuery "AWDWINNERSQRY"
        DoCmd.SetWarnings True
        'End Display

        'Export Winners to their Directory to their individual Directories
        db.TableDefs.Refresh
        DoCmd.TransferText acExportDelim, , "AWDWinners", SaveFile, True
        db.TableDefs.Refresh
        'Export to Directory Finished

        SaveFile = "Q:\CCNMACS\AWD" & CTRY & "\AWD_" & MTH & ReportName & ".csv"

        'Export Winners to their Directory to their individual Directories
        db.TableDefs.Refresh
        DoCmd.Rename "AWDWinners" & ReportName, acTable, "AWDWinners"
        DoCmd.TransferText acExportDelim, , "AWDWinners" & ReportName, SaveFile, True
        db.TableDefs.Refresh
        'Export to Directory Finished

        DoCmd.SetWarnings False
        DoCmd.DeleteObject acTable, "AWDWinners" & ReportName
        DoCmd.SetWarnings True

        Application.SetOption "Auto compact", True

        rs.MoveNext
    Loop
Else
    MsgBox "There are no Records in the RecordSet."
End If
rs.Close
Set rs = Nothing

Upvotes: 0

Views: 473

Answers (4)

ASH
ASH

Reputation: 20322

How about linking to all 10k files, instead of importing them? This will consume a lot less memory. The VBA script below will loop through all files in a folder and link to each one.

''''  LINK TO ALL CSV FILES OR ALL TEXT FILES IN A FOLDER...
Private Sub Command0_Click()

     'Macro Loops through the specified directory (strPath)
     'and links ALL Excel files as linked tables in the Access
     'Database.

    Const strPath As String = "C:\your_path_here\" 'Directory Path
    Dim strFile As String 'Filename
    Dim strFileList() As String 'File Array
    Dim intFile As Integer 'File Number

     'Loop through the folder & build file list
    strFile = Dir(strPath & "*.csv")
    While strFile <> ""
         'add files to the list
        intFile = intFile + 1
        ReDim Preserve strFileList(1 To intFile)
        strFileList(intFile) = strFile
        strFile = Dir()
    Wend
     'see if any files were found
    If intFile = 0 Then
        MsgBox "No files found"
        Exit Sub
    End If
     'cycle through the list of files & link to Access
    For intFile = 1 To UBound(strFileList)
        DoCmd.TransferText acLinkDelim, , _
        strFileList(intFile), strPath & strFileList(intFile), True, ""
         'Check out the TransferSpreadsheet options in the Access
         'Visual Basic Help file for a full description & list of
         'optional settings
    Next
    MsgBox UBound(strFileList) & " Files were Linked"

End Sub

As an aside, and this will probably get down-voted just because I like to propose alternate solutions, consider using R or Python to to the manipulate the files. It seems like the process of importing 10k text files is bloating Access up to 2GB. This completely makes sense. Consider doing something like merging all files into one file, and then import that one file into Access. I have no idea how large each file is, but certainly it will be easier to import one file rather than 10k files.

# R:
setwd("C:/Users/Excel/Desktop/TEST") 
txt_files <- list.files()
list_of_reads <- lapply(txt_files, readLines)
df_of_reads <- data.frame(file_name = txt_files, contents = do.call(rbind, list_of_reads))
write.csv(df_of_reads, "one_big_CSV.csv", row.names = F)

Or...

# Python
import glob2

filenames = glob2.glob('C:/Users/Excel/Desktop/test/*.txt')  # list of all .txt files in the directory

with open('C:/Users/Excel/Desktop/test/outfile.txt', 'w') as f:
    for file in filenames:
        with open(file) as infile:
            f.write(infile.read()+'\n')

Or, finally, use SQL Server, and Bulk Insert all files using a loop. If you want more info on how to do this, post back and let me know.

Upvotes: 0

Albert D. Kallal
Albert D. Kallal

Reputation: 49049

As noted one can consider creating an external accDB file, and use that for the processing. That way you can after processing “x” number of files either create a new blank db, or even compact that external accDB.

You also should consider turning off row locking, as this can be a major source of bloat. I seen some process expand a 6 meg file to 126 megs, and turning off row locking resulted in the 6 meg file after processing still at about 6 megs.

So row locking can effect “massive” the amount of bloat (and you get a good deal better performance also!!).

So you can try turning off row locking, but really, just creating a blank external accDB file (and linking to it) would also solve this issue.

Example how to use a temporary mdb/accdb in your application here:

http://www.granite.ab.ca/access/temptables.htm

Upvotes: 2

Wolfgang Kais
Wolfgang Kais

Reputation: 4100

How about linking the files instead of importing them? Try TransferType:=acLinkDelim...

Upvotes: 0

Minty
Minty

Reputation: 1626

You can't easily compact and repair the database you are in mid process, however you can easily do that to another database.

Consider using a separate "Side" database that holds the imported data. You link to that and perform all the importing into that. You can then record the position you have reached in your looped code in the main database , and as often as required you can compact and repair the side database.

Upvotes: 3

Related Questions