Nitish
Nitish

Reputation: 43

How to merge two csv files in powershell with same headers and discard duplicate rows

I am collecting performance counters from NetApp Performance Manager software (OPM). OPM saves 30 days worth of data in MySQL database. So i have to put two queries to retrieve the data:

  1. First Query on 30th of every month and save in csv file.
  2. Second Query on 1st of every month and save in csv file.

Then merge the two csv files to get data if there are 31 days in a month.

Both files look like below:

"Time","objid","cpuBusy","cifsOps","avgLatency"
"2016:06:04 00:04","72","50.6196","2069.11","7622.1"
"2016:06:04 00:09","72","30.2233","2075.94","7633.27"
"2016:06:04 00:14","72","35.2559","1980.64","8352.17"

When i merge the two csv files with below code. I get duplicate rows with data from same data/time.

@(Import-Csv au2004npsa003-mm-business.csv) + @(Import-Csv au2004npsa003-nn-business.csv) | export-csv joined.csv -NoTypeInformation

How can i merge the two csv files without getting duplicate data? I have tried select -unique however, it gives just one row.

Upvotes: 4

Views: 16586

Answers (3)

mklement0
mklement0

Reputation: 437111

As for why Select-Object -Unique didn't work:

  • Select-Object -Unique, when given instances of reference types (other than strings), compares their .ToString() values in order to determine uniqueness.

  • [pscustomobject] instances, such as the ones Import-Csv creates, regrettably return the empty string from their .ToString() method.

    • This long-standing bug, still present as of PowerShell (Core) 7.2, was originally reported in GitHub issue #6163.

Thus, all input objects compare the same, and only the first input object is ever returned.

S9uare's helpful Select-Object -Property * -Unique approach overcomes this problem by forcing all properties to be compared invidually, but comes with a performance caveat:
The input objects are effectively recreated, and comparing all property values is overkill in this case, because comparing Time values would suffice; with large input files, processing can take a long time.


Since the data at hand comes from CSV files, the performance problem can be helped with string processing, using Get-Content rather than Import-Csv:

Get-Content au2004npsa003-mm-business.csv, au2004npsa003-nn-business.csv |
  Select-Object -Unique |
    Set-Content -Encoding ASCII joined.csv

Note that I'm using -Encoding ASCII to mimic Export-Csv's default behavior; change as needed.

With input objects that are strings, Select-Object -Unique works as expected - and is faster.
Note, however, that with large input files that you may run out of memory, given that Select-Object needs to build up an in-memory data structure containing all rows in order to determine uniqueness.

Upvotes: 7

DisplayName
DisplayName

Reputation: 1016

made it complete script just run it

#** ### Merged CSV to Unique


#csv1
   # name,id, birth
  #John,007,1999
  #john,008,1999
  #john,709,1999


#csv2

  # name,id, birth
  #John,006,1999
  #john,005,1999
  #john,709,1999

#After Merged

  #John,007,1999
  #john,008,1999
  #**john,709,1999
  #John,006,1999
  #john,005,1999


  #  >>> Only one Unique Files exits john,709,1999


  ##### Just merge CSV example

#csv1
   # name,id, birth
  #John,007,1999
  #john,008,1999
  #john,709,1999

  #csv2
  # name,id, birth
   #John,006,1999
  #john,005,1999
  #john,709,1999


  ## after Merged

   #name,id, birth
  #John,007,1999
  #john,008,1999
  #john,709,1999
  #John,006,1999
  #john,005,1999
  #john,709,1999




 ## Usage

$csv1 = Import-Csv -Path "D:\CSV1.csv" #change the path to where csv one exists
$csv2 = Import-Csv -Path "D:\CSV2.csv" #change the path to where other exists
$UniQexpopath= 'D:\uniqueMerged.csv'  #change the path to where you want to export unique csv
$expopath= 'D:\merged.csv' #change the path to where you want to export Merged csv



  Write-Host "Would you like to Unique Merge the CSV?"        -ForegroundColor Green 
$Response = Read-Host "[Y] Yes, [N] No" 
    If($Response -eq "y") 
{write-host "Merging Csv please wait" -foregroundcolor Red

$merged | Select -Property * -Unique | Export-csv -path $UniQexpopath -NoTypeInformation 
write-host "Merging completed please check $UniQexpopath" -foregroundcolor Green}


  else{ write-host "Merging Csv please wait" -foregroundcolor Red
  $merged = $csv1 + $csv2

$merged | Export-csv -path $expopath -NoTypeInformation
write-host "Merging completed please check $expopath" -foregroundcolor Green}

Upvotes: 0

S9uare
S9uare

Reputation: 56

Select -Unique is actually the correct way, you just need a little bit more. Which is -Property * or simply *

As Select -Unique does not automatically compares all the properties, you need to specify * so that it compares the whole object.

$csv1 = Import-Csv -Path ".\csv1.csv"
$csv2 = Import-Csv -Path ".\csv2.csv"

$merged = $csv1 + $csv2

$merged | Select -Property * -Unique

Upvotes: 3

Related Questions