Reputation: 43
I am collecting performance counters from NetApp Performance Manager software (OPM). OPM saves 30 days worth of data in MySQL database. So i have to put two queries to retrieve the data:
Then merge the two csv files to get data if there are 31 days in a month.
Both files look like below:
"Time","objid","cpuBusy","cifsOps","avgLatency"
"2016:06:04 00:04","72","50.6196","2069.11","7622.1"
"2016:06:04 00:09","72","30.2233","2075.94","7633.27"
"2016:06:04 00:14","72","35.2559","1980.64","8352.17"
When i merge the two csv files with below code. I get duplicate rows with data from same data/time.
@(Import-Csv au2004npsa003-mm-business.csv) + @(Import-Csv au2004npsa003-nn-business.csv) | export-csv joined.csv -NoTypeInformation
How can i merge the two csv files without getting duplicate data?
I have tried select -unique
however, it gives just one row.
Upvotes: 4
Views: 16586
Reputation: 437111
As for why Select-Object -Unique
didn't work:
Select-Object -Unique
, when given instances of reference types (other than strings), compares their .ToString()
values in order to determine uniqueness.
[pscustomobject]
instances, such as the ones Import-Csv
creates, regrettably return the empty string from their .ToString()
method.
Thus, all input objects compare the same, and only the first input object is ever returned.
S9uare's helpful Select-Object -Property * -Unique
approach overcomes this problem by forcing all properties to be compared invidually, but comes with a performance caveat:
The input objects are effectively recreated, and comparing all property values is overkill in this case, because comparing Time
values would suffice; with large input files, processing can take a long time.
Since the data at hand comes from CSV files, the performance problem can be helped with string processing, using Get-Content
rather than Import-Csv
:
Get-Content au2004npsa003-mm-business.csv, au2004npsa003-nn-business.csv |
Select-Object -Unique |
Set-Content -Encoding ASCII joined.csv
Note that I'm using -Encoding ASCII
to mimic Export-Csv
's default behavior; change as needed.
With input objects that are strings, Select-Object -Unique
works as expected - and is faster.
Note, however, that with large input files that you may run out of memory, given that Select-Object
needs to build up an in-memory data structure containing all rows in order to determine uniqueness.
Upvotes: 7
Reputation: 1016
made it complete script just run it
#** ### Merged CSV to Unique
#csv1
# name,id, birth
#John,007,1999
#john,008,1999
#john,709,1999
#csv2
# name,id, birth
#John,006,1999
#john,005,1999
#john,709,1999
#After Merged
#John,007,1999
#john,008,1999
#**john,709,1999
#John,006,1999
#john,005,1999
# >>> Only one Unique Files exits john,709,1999
##### Just merge CSV example
#csv1
# name,id, birth
#John,007,1999
#john,008,1999
#john,709,1999
#csv2
# name,id, birth
#John,006,1999
#john,005,1999
#john,709,1999
## after Merged
#name,id, birth
#John,007,1999
#john,008,1999
#john,709,1999
#John,006,1999
#john,005,1999
#john,709,1999
## Usage
$csv1 = Import-Csv -Path "D:\CSV1.csv" #change the path to where csv one exists
$csv2 = Import-Csv -Path "D:\CSV2.csv" #change the path to where other exists
$UniQexpopath= 'D:\uniqueMerged.csv' #change the path to where you want to export unique csv
$expopath= 'D:\merged.csv' #change the path to where you want to export Merged csv
Write-Host "Would you like to Unique Merge the CSV?" -ForegroundColor Green
$Response = Read-Host "[Y] Yes, [N] No"
If($Response -eq "y")
{write-host "Merging Csv please wait" -foregroundcolor Red
$merged | Select -Property * -Unique | Export-csv -path $UniQexpopath -NoTypeInformation
write-host "Merging completed please check $UniQexpopath" -foregroundcolor Green}
else{ write-host "Merging Csv please wait" -foregroundcolor Red
$merged = $csv1 + $csv2
$merged | Export-csv -path $expopath -NoTypeInformation
write-host "Merging completed please check $expopath" -foregroundcolor Green}
Upvotes: 0
Reputation: 56
Select -Unique
is actually the correct way, you just need a little bit more. Which is -Property *
or simply *
As Select -Unique
does not automatically compares all the properties, you need to specify *
so that it compares the whole object.
$csv1 = Import-Csv -Path ".\csv1.csv"
$csv2 = Import-Csv -Path ".\csv2.csv"
$merged = $csv1 + $csv2
$merged | Select -Property * -Unique
Upvotes: 3