IT-geek
IT-geek

Reputation: 9

Appending CSV files with same type of data but different column names

Hope you are doing well. I need some help. Just for example, I have 3 csv files:

1) File1.csv with 2 records / rows

firstname | lastname | city | country | emailaddress
-----------------------------------------------------
alexf     sdfsd    mumbai india [email protected]
----------
asfd      sdfsdf   toronto canada [email protected]

2) secondfile.csv with 2 records / rows

first-name | last-name | currentcity | currentcountry | email-address
----------------------------------------------------------------------
asdf        sdfkjwl  sydney      australia      [email protected]
----------
lskjdf     sdlfkjlkj delhi       india           [email protected]

3) userfile.csv with 2 records / rows

fname | lname | usercity | usercountry | email
-----------------------------------------------
sdf   sdflj auckland new zealand [email protected]
----------
sdfsdf sdf  venice   italy       [email protected]

Now I want to create a single csv or excel or mysql or any database table in which i want all of these records from all the different csv files with different columns / header names but having same type of data. Like this:

singlecsvfile.csv

first_name | last_name | city | country |     email_address
--------------------------------------------------------
alexf        sdfsd       mumbai   india       [email protected]
asfd         sdfsdf      toronto canada       [email protected]
asdf        sdfkjwl       sydney  australia   [email protected]
lskjdf      sdlfkjlkj     delhi    india      [email protected]
sdf         sdflj         auckland new zealand [email protected]
sdfsdf      sdf           venice   italy       [email protected]

In reality I have 50+ files like that with different column names but same type of data due to different type of data-sources. What would you suggest me to do, what strategy or way would you suggest and how should i implement this. Please suggest me easy way (excel / powerquery / powerBI) or code(php / sql) with some description if possible. I am in need of quick or automated solution something like data mapping. I have searched alot but could not find any solution. Suggestions would be appreciated. Thanks

Upvotes: 0

Views: 1081

Answers (2)

Mike Honey
Mike Honey

Reputation: 15017

I would use Power Query for this. You will need a separate Query for each input file layout. These will just rename the columns to match your singlecsvfile.csv column names. I would set each to Load To / Only Create a Connection.

Then a final singlecsvfile Query will use Append Queries to add all the data from the input queries. Power Query uses column names to combine data in an Append - the sequence of the columns (from left to right) doesn't matter.

If there is any common layouts amongst your 50+ files, I would separate them into subfolders. You can then rip through all the files in a subfolder using a single input Query - start using From File / From Folder

Power Query will deliver the output to an Excel Table. If you actually need a CSV output, just record a Macro to refresh the Power Query and Save As CSV.

Upvotes: 2

gtwebb
gtwebb

Reputation: 3011

SuperUser isn't really a code writing service. That being said I've got a piece of code that should basically do what you want in vba. Its got some commenting so should be managable. Might need some tweaking depending on your files.

Option Explicit
Global first_sheet As Boolean
Global append As Boolean


Sub process_folder()
Dim book_counter As Integer
Dim folder_path As String
Dim pWB As Workbook, sWB As Workbook, sWB_name As String
Dim pWS As Worksheet

    book_counter = 0
    first_sheet = True

    'Flag between appending in one sheet and copying into individual sheets
    append = True

    Set pWB = ActiveWorkbook
    Set pWS = pWB.ActiveSheet

    folder_path = "O:\Active\_2010\1193\10-1193-0015 Kennecott eagle\Phase 8500 - DFN Modelling\4. Analysis & Modelling\Phase 2 - DFN building\Export\fracture_properties\20140205"

    folder_path = verify_folder(folder_path)
    If folder_path = "NULL" Then
        Exit Sub
    End If

    'Get first file to open
    sWB_name = Dir(folder_path, vbNormal)

    'Loop through files
    Do While sWB_name <> ""

        'Open each file
        Workbooks.Open Filename:=folder_path & sWB_name
        Set sWB = Workbooks(sWB_name)

        Call process_workbook(pWB, sWB)

        'close file increment counter
        sWB_name = Dir()
        book_counter = book_counter + 1
    Loop

    'Number of files processed
    MsgBox ("Number of Fragment Files processed: " & book_counter)


End Sub

Sub process_workbook(pWB As Workbook, sWB As Workbook)

       If append Then
        Call append_all_sheets(pWB, sWB, 1)
       Else
        Call copy_all_sheets(pWB, sWB)
       End If

End Sub

Sub copy_all_sheets(pWB As Workbook, sWB As Workbook)
Dim ws As Worksheet

    For Each ws In sWB.Worksheets
        ws.Move After:=pWB.Sheets(pWB.Sheets.count)
    Next ws

End Sub

Sub append_all_sheets(pWB As Workbook, sWB As Workbook, headerRows As Long)

Dim lastCol As Long, lastRow As Long, pasteRow As Long
Dim count As Integer
Dim ws As Worksheet

    For Each ws In sWB.Worksheets
        lastCol = find_last_col(ws)
        lastRow = find_last_row(ws)
        pasteRow = find_last_row(pWB.Sheets(1))


        'Copy entire data range if its the first sheet otherwise leave of the header row
        If first_sheet Then
         '   ws.Range("A1").Resize(lastRow, lastCol).Copy

            pWB.Sheets(1).Range("A" & pasteRow).Resize(lastRow, lastCol).Formula = ws.Range("A1").Resize(lastRow, lastCol).Formula
                'Destination:=pWB.Sheets(1).pasteRow
        Else
            'pWB.Sheets(1).Formula = ws.Range("A1").Offset(headerRows, 0).Resize(lastRow - headerRows, lastCol).Formula
            pWB.Sheets(1).Range("A" & pasteRow).Resize(lastRow - headerRows, lastCol).Formula = ws.Range("A1").Offset(headerRows, 0).Resize(lastRow - headerRows, lastCol).Formula
        End If

        first_sheet = False
    Next ws

    sWB.Close (False)

End Sub

Function find_last_row(ws As Worksheet) As Long

    With ws
        If Application.WorksheetFunction.CountA(.Cells) <> 0 Then
            find_last_row = .Cells.Find(What:="*", _
                          After:=.Range("A1"), _
                          Lookat:=xlPart, _
                          LookIn:=xlFormulas, _
                        SearchOrder:=xlByRows, _
                        SearchDirection:=xlPrevious, _
                        MatchCase:=False).Row
        Else
          find_last_row = 1
        End If
    End With


End Function

Function find_last_col(ws As Worksheet) As Long

    With ws
        If Application.WorksheetFunction.CountA(.Cells) <> 0 Then
            find_last_col = .Cells.Find(What:="*", _
                          After:=.Range("A1"), _
                          Lookat:=xlPart, _
                          LookIn:=xlFormulas, _
                        SearchOrder:=xlByColumns, _
                        SearchDirection:=xlPrevious, _
                        MatchCase:=False).Column
        Else
          find_last_col = 1
        End If
    End With


End Function

Function verify_folder(path As String) As String

    If path = "" Then
        MsgBox ("Enter the Directory of the Fragment simulation files to process")
        verify_folder = "NULL"
        Exit Function
    End If

    If Not PathExists(path) Then
        MsgBox ("Directory does not exist")
        verify_folder = "NULL"
        Exit Function

    End If

    If Right(path, 1) <> "\" Then
            verify_folder = path & "\"
    End If

End Function

Function PathExists(pName) As Boolean
On Error Resume Next
    PathExists = (GetAttr(pName) And vbDirectory) = vbDirectory
End Function

Upvotes: 0

Related Questions