eton blue
eton blue

Reputation: 137

Building Excel spreadsheet from multiple CSV files

I have 1 Excel workbook with a single sheet and 3 csv files. The worksheet has a fixed number of columns (with headers) and a varying number of rows. One column contains an "ID" field and the final three columns are blank. Each csv has a varying number of columns but all have an "ID" field which should match the "ID" field in the Excel sheet. While the Excel sheet may have multiple instances of the same ID, there are no duplicate IDs in the CSVs. For example:

Excel File

ID:     Name:   Color:  Location:   Age:    Siblings:
123     Bob     Red
234     Sally   Green
345     Donald  Orange
123     Bob     Black

CSV1

ID:     Name:   Place:  Animal: Location:   Car:
123     Bob     Here    Dog     Up          Ferarri
234     Sally   There   Cat     Down        Porsche
345     Donald  Nowhere Squid   Right       Yugo

CSV2

ID:     Name:   Place:  Age:    
123     Bob     Here    50      
234     Sally   There   45      
345     Donald  Nowhere 100

CSV3

ID:     Siblings:
123     Five
234     Three
345     Eight

The goal is to add the data from specific columns in the CSV files to the excel file based on matching IDs. Expected output would be the following Excel file:

ID:     Name:   Color:  Location:   Age:    Siblings:
123     Bob     Red     Up          50      Five
234     Sally   Green   Down        45      Three
345     Donald  Orange  Right       100     Eight
123     Bob     Black   Up          50      Five

I have spent quite a bit of time trying to figure the most efficient (fast) method for doing this and think I've hit a brick wall. What I have so far (in relevant part):

# Pull relevant data from csv files together #
$rtFile = $selectedDirectory + "\\" + "*RT*.csv"
$seFile = $selectedDirectory + "\\" + "*SE*.csv"
$lmFile = $selectedDirectory + "\\" + "*LM*.csv"

$rtCSV = Import-Csv $rtFile | select ID, LOCATION
$seCSV = Import-Csv $seFile | select ID, AGE
$lmCSV = Import-Csv $lmFile | select ID, SIBLINGS

$rtCSV | ForEach {$_ | Add-Member 'AGE' $null}
$rtCSV | ForEach {$_ | Add-Member 'SIBLINGS' $null}

foreach ($record in $rtCSV) {
    $record.'AGE' = $seCSV | Where {$_.ID -eq $record.ID} | Select -Expand 'AGE'
    $Record.'SIBLINGS' = $lmCSV | Where {$_.ID -eq $record.ID} | Select -Expand 'SIBLINGS'
}

# Add Data to Excel Sheet #
$WorkSheet.Activate()
$range = $WorkSheet.Range("C1").EntireColumn

foreach ($searchStr in $rtCSV.ID) {
    $search = $range.Find($searchStr)

    if ($search -ne $null) {
        $firstAdr = $search.Address(0, 0, 1, 0)

        do {
            $WorkSheet.Cells.Item($search.row,17).Value() = $rtCSV[$search.row].LOCATION
            $WorkSheet.Cells.Item($search.row,18).Value() = $rtCSV[$search.row].AGE
            $WorkSheet.Cells.Item($search.row,19).Value() = $rtCSV[$search.row].SIBLINGS

            $search = $range.FindNext($search)
        } while ($search -ne $null -and $search.Address(0, 0, 1, 0) -ne $firstAdr)
    }
}

It took me awhile, but I finally figured out why the above doesn't work. While $search.row does return the matching row in the Excel document (and is thus fine to use to determine in which cell to insert the data) it does not return the appropriate index(?) for the corresponding values in $rtCSV. So how do I ensure that I insert the correct values for LOCATION, AGE, and SIBLINGS each time ID's match?

If it isn't possible under the current construction, is there another (perhaps better and more efficient) way? Generally speaking, the Excel file should never have more than 1,000 rows.

Upvotes: 4

Views: 590

Answers (2)

Mike Twc
Mike Twc

Reputation: 2355

If your main concern is performance consider 2 things:

Convert "lookup tables" (csv1, csv2, csv3) to hashtables, so look-ups will be fast (it's somewhat similar to indexing in databases)

Keep all data in flat files (csv), to avoid dependencies and keep your script simple and efficient. You can then link your data to a spreadsheet as a view to apply formatting if you need (you can create linked table and views from the csv in MS Access and then feed it to a data table in Excel)

Below is an example (using some other pseudo files for simplicity). You can use worksheet object in a main loop too (just loop through row indexes and update cell objects).

# generate hashtables
$person = @{}
$location = @{}
Import-Csv location.csv | foreach {$location.Add($_.id, @{zip=$_.zip; city=$_.city})}
Import-Csv person.csv | foreach {$person.Add($_.id, @{name=$_.name; age=$_.age})}

# loop through the main file
Import-Csv main.csv | foreach {
    $id = $_.id
    $_.name = $person[$id].name
    $_.age = $person[$id].age
    $_.city = $location[$id].city
    $_.zip = $location[$id].zip
    Write-Output $_ } | Convertto-Csv | Out-File mainOut.csv

Upvotes: 0

Robert Dyjas
Robert Dyjas

Reputation: 5217

If you don't mind installing additional modules to simplify your work, I'd strongly recommend dfinke's Import Excel module. Once you install if, the code to achieve your goal would be simple as that:

# cd C:\SO\53529676
$rtCSV = Import-Csv .\csv1.csv | select ID, LOCATION
$seCSV = Import-Csv .\csv2.csv | select ID, AGE
$lmCSV = Import-Csv .\csv3.csv | select ID, SIBLINGS
$excel = Import-Excel .\Excel1.xlsx

foreach ($record in $excel) {
  $record.'LOCATION' = ($rtCSV | Where {$_.ID -eq $record.ID}).LOCATION
  $record.'AGE' = ($seCSV | Where {$_.ID -eq $record.ID}).AGE
  $Record.'SIBLINGS' = ($lmCSV | Where {$_.ID -eq $record.ID}).SIBLINGS
}

$excel | Export-Excel .\Excel2.xlsx

Upvotes: 1

Related Questions