Reputation: 9
Hope you are doing well. I need some help. Just for example, I have 3 csv files:
1) File1.csv with 2 records / rows
firstname | lastname | city | country | emailaddress
-----------------------------------------------------
alexf sdfsd mumbai india [email protected]
----------
asfd sdfsdf toronto canada [email protected]
2) secondfile.csv with 2 records / rows
first-name | last-name | currentcity | currentcountry | email-address
----------------------------------------------------------------------
asdf sdfkjwl sydney australia [email protected]
----------
lskjdf sdlfkjlkj delhi india [email protected]
3) userfile.csv with 2 records / rows
fname | lname | usercity | usercountry | email
-----------------------------------------------
sdf sdflj auckland new zealand [email protected]
----------
sdfsdf sdf venice italy [email protected]
Now I want to create a single csv or excel or mysql or any database table in which i want all of these records from all the different csv files with different columns / header names but having same type of data. Like this:
singlecsvfile.csv
first_name | last_name | city | country | email_address
--------------------------------------------------------
alexf sdfsd mumbai india [email protected]
asfd sdfsdf toronto canada [email protected]
asdf sdfkjwl sydney australia [email protected]
lskjdf sdlfkjlkj delhi india [email protected]
sdf sdflj auckland new zealand [email protected]
sdfsdf sdf venice italy [email protected]
In reality I have 50+ files like that with different column names but same type of data due to different type of data-sources. What would you suggest me to do, what strategy or way would you suggest and how should i implement this. Please suggest me easy way (excel / powerquery / powerBI) or code(php / sql) with some description if possible. I am in need of quick or automated solution something like data mapping. I have searched alot but could not find any solution. Suggestions would be appreciated. Thanks
Upvotes: 0
Views: 1081
Reputation: 15017
I would use Power Query for this. You will need a separate Query for each input file layout. These will just rename the columns to match your singlecsvfile.csv column names. I would set each to Load To / Only Create a Connection.
Then a final singlecsvfile Query will use Append Queries to add all the data from the input queries. Power Query uses column names to combine data in an Append - the sequence of the columns (from left to right) doesn't matter.
If there is any common layouts amongst your 50+ files, I would separate them into subfolders. You can then rip through all the files in a subfolder using a single input Query - start using From File / From Folder
Power Query will deliver the output to an Excel Table. If you actually need a CSV output, just record a Macro to refresh the Power Query and Save As CSV.
Upvotes: 2
Reputation: 3011
SuperUser isn't really a code writing service. That being said I've got a piece of code that should basically do what you want in vba. Its got some commenting so should be managable. Might need some tweaking depending on your files.
Option Explicit
Global first_sheet As Boolean
Global append As Boolean
Sub process_folder()
Dim book_counter As Integer
Dim folder_path As String
Dim pWB As Workbook, sWB As Workbook, sWB_name As String
Dim pWS As Worksheet
book_counter = 0
first_sheet = True
'Flag between appending in one sheet and copying into individual sheets
append = True
Set pWB = ActiveWorkbook
Set pWS = pWB.ActiveSheet
folder_path = "O:\Active\_2010\1193\10-1193-0015 Kennecott eagle\Phase 8500 - DFN Modelling\4. Analysis & Modelling\Phase 2 - DFN building\Export\fracture_properties\20140205"
folder_path = verify_folder(folder_path)
If folder_path = "NULL" Then
Exit Sub
End If
'Get first file to open
sWB_name = Dir(folder_path, vbNormal)
'Loop through files
Do While sWB_name <> ""
'Open each file
Workbooks.Open Filename:=folder_path & sWB_name
Set sWB = Workbooks(sWB_name)
Call process_workbook(pWB, sWB)
'close file increment counter
sWB_name = Dir()
book_counter = book_counter + 1
Loop
'Number of files processed
MsgBox ("Number of Fragment Files processed: " & book_counter)
End Sub
Sub process_workbook(pWB As Workbook, sWB As Workbook)
If append Then
Call append_all_sheets(pWB, sWB, 1)
Else
Call copy_all_sheets(pWB, sWB)
End If
End Sub
Sub copy_all_sheets(pWB As Workbook, sWB As Workbook)
Dim ws As Worksheet
For Each ws In sWB.Worksheets
ws.Move After:=pWB.Sheets(pWB.Sheets.count)
Next ws
End Sub
Sub append_all_sheets(pWB As Workbook, sWB As Workbook, headerRows As Long)
Dim lastCol As Long, lastRow As Long, pasteRow As Long
Dim count As Integer
Dim ws As Worksheet
For Each ws In sWB.Worksheets
lastCol = find_last_col(ws)
lastRow = find_last_row(ws)
pasteRow = find_last_row(pWB.Sheets(1))
'Copy entire data range if its the first sheet otherwise leave of the header row
If first_sheet Then
' ws.Range("A1").Resize(lastRow, lastCol).Copy
pWB.Sheets(1).Range("A" & pasteRow).Resize(lastRow, lastCol).Formula = ws.Range("A1").Resize(lastRow, lastCol).Formula
'Destination:=pWB.Sheets(1).pasteRow
Else
'pWB.Sheets(1).Formula = ws.Range("A1").Offset(headerRows, 0).Resize(lastRow - headerRows, lastCol).Formula
pWB.Sheets(1).Range("A" & pasteRow).Resize(lastRow - headerRows, lastCol).Formula = ws.Range("A1").Offset(headerRows, 0).Resize(lastRow - headerRows, lastCol).Formula
End If
first_sheet = False
Next ws
sWB.Close (False)
End Sub
Function find_last_row(ws As Worksheet) As Long
With ws
If Application.WorksheetFunction.CountA(.Cells) <> 0 Then
find_last_row = .Cells.Find(What:="*", _
After:=.Range("A1"), _
Lookat:=xlPart, _
LookIn:=xlFormulas, _
SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious, _
MatchCase:=False).Row
Else
find_last_row = 1
End If
End With
End Function
Function find_last_col(ws As Worksheet) As Long
With ws
If Application.WorksheetFunction.CountA(.Cells) <> 0 Then
find_last_col = .Cells.Find(What:="*", _
After:=.Range("A1"), _
Lookat:=xlPart, _
LookIn:=xlFormulas, _
SearchOrder:=xlByColumns, _
SearchDirection:=xlPrevious, _
MatchCase:=False).Column
Else
find_last_col = 1
End If
End With
End Function
Function verify_folder(path As String) As String
If path = "" Then
MsgBox ("Enter the Directory of the Fragment simulation files to process")
verify_folder = "NULL"
Exit Function
End If
If Not PathExists(path) Then
MsgBox ("Directory does not exist")
verify_folder = "NULL"
Exit Function
End If
If Right(path, 1) <> "\" Then
verify_folder = path & "\"
End If
End Function
Function PathExists(pName) As Boolean
On Error Resume Next
PathExists = (GetAttr(pName) And vbDirectory) = vbDirectory
End Function
Upvotes: 0