Reputation: 137
I have 1 Excel workbook with a single sheet and 3 csv files. The worksheet has a fixed number of columns (with headers) and a varying number of rows. One column contains an "ID" field and the final three columns are blank. Each csv has a varying number of columns but all have an "ID" field which should match the "ID" field in the Excel sheet. While the Excel sheet may have multiple instances of the same ID, there are no duplicate IDs in the CSVs. For example:
Excel File
ID: Name: Color: Location: Age: Siblings: 123 Bob Red 234 Sally Green 345 Donald Orange 123 Bob Black
CSV1
ID: Name: Place: Animal: Location: Car: 123 Bob Here Dog Up Ferarri 234 Sally There Cat Down Porsche 345 Donald Nowhere Squid Right Yugo
CSV2
ID: Name: Place: Age: 123 Bob Here 50 234 Sally There 45 345 Donald Nowhere 100
CSV3
ID: Siblings: 123 Five 234 Three 345 Eight
The goal is to add the data from specific columns in the CSV files to the excel file based on matching IDs. Expected output would be the following Excel file:
ID: Name: Color: Location: Age: Siblings: 123 Bob Red Up 50 Five 234 Sally Green Down 45 Three 345 Donald Orange Right 100 Eight 123 Bob Black Up 50 Five
I have spent quite a bit of time trying to figure the most efficient (fast) method for doing this and think I've hit a brick wall. What I have so far (in relevant part):
# Pull relevant data from csv files together #
$rtFile = $selectedDirectory + "\\" + "*RT*.csv"
$seFile = $selectedDirectory + "\\" + "*SE*.csv"
$lmFile = $selectedDirectory + "\\" + "*LM*.csv"
$rtCSV = Import-Csv $rtFile | select ID, LOCATION
$seCSV = Import-Csv $seFile | select ID, AGE
$lmCSV = Import-Csv $lmFile | select ID, SIBLINGS
$rtCSV | ForEach {$_ | Add-Member 'AGE' $null}
$rtCSV | ForEach {$_ | Add-Member 'SIBLINGS' $null}
foreach ($record in $rtCSV) {
$record.'AGE' = $seCSV | Where {$_.ID -eq $record.ID} | Select -Expand 'AGE'
$Record.'SIBLINGS' = $lmCSV | Where {$_.ID -eq $record.ID} | Select -Expand 'SIBLINGS'
}
# Add Data to Excel Sheet #
$WorkSheet.Activate()
$range = $WorkSheet.Range("C1").EntireColumn
foreach ($searchStr in $rtCSV.ID) {
$search = $range.Find($searchStr)
if ($search -ne $null) {
$firstAdr = $search.Address(0, 0, 1, 0)
do {
$WorkSheet.Cells.Item($search.row,17).Value() = $rtCSV[$search.row].LOCATION
$WorkSheet.Cells.Item($search.row,18).Value() = $rtCSV[$search.row].AGE
$WorkSheet.Cells.Item($search.row,19).Value() = $rtCSV[$search.row].SIBLINGS
$search = $range.FindNext($search)
} while ($search -ne $null -and $search.Address(0, 0, 1, 0) -ne $firstAdr)
}
}
It took me awhile, but I finally figured out why the above doesn't work. While $search.row
does return the matching row in the Excel document (and is thus fine to use to determine in which cell to insert the data) it does not return the appropriate index(?) for the corresponding values in $rtCSV
. So how do I ensure that I insert the correct values for LOCATION, AGE, and SIBLINGS each time ID's match?
If it isn't possible under the current construction, is there another (perhaps better and more efficient) way? Generally speaking, the Excel file should never have more than 1,000 rows.
Upvotes: 4
Views: 590
Reputation: 2355
If your main concern is performance consider 2 things:
Convert "lookup tables" (csv1, csv2, csv3) to hashtables, so look-ups will be fast (it's somewhat similar to indexing in databases)
Keep all data in flat files (csv), to avoid dependencies and keep your script simple and efficient. You can then link your data to a spreadsheet as a view to apply formatting if you need (you can create linked table and views from the csv in MS Access and then feed it to a data table in Excel)
Below is an example (using some other pseudo files for simplicity). You can use worksheet object in a main loop too (just loop through row indexes and update cell objects).
# generate hashtables
$person = @{}
$location = @{}
Import-Csv location.csv | foreach {$location.Add($_.id, @{zip=$_.zip; city=$_.city})}
Import-Csv person.csv | foreach {$person.Add($_.id, @{name=$_.name; age=$_.age})}
# loop through the main file
Import-Csv main.csv | foreach {
$id = $_.id
$_.name = $person[$id].name
$_.age = $person[$id].age
$_.city = $location[$id].city
$_.zip = $location[$id].zip
Write-Output $_ } | Convertto-Csv | Out-File mainOut.csv
Upvotes: 0
Reputation: 5217
If you don't mind installing additional modules to simplify your work, I'd strongly recommend dfinke's Import Excel module. Once you install if, the code to achieve your goal would be simple as that:
# cd C:\SO\53529676
$rtCSV = Import-Csv .\csv1.csv | select ID, LOCATION
$seCSV = Import-Csv .\csv2.csv | select ID, AGE
$lmCSV = Import-Csv .\csv3.csv | select ID, SIBLINGS
$excel = Import-Excel .\Excel1.xlsx
foreach ($record in $excel) {
$record.'LOCATION' = ($rtCSV | Where {$_.ID -eq $record.ID}).LOCATION
$record.'AGE' = ($seCSV | Where {$_.ID -eq $record.ID}).AGE
$Record.'SIBLINGS' = ($lmCSV | Where {$_.ID -eq $record.ID}).SIBLINGS
}
$excel | Export-Excel .\Excel2.xlsx
Upvotes: 1