Reputation: 3597
I need to compare two csv files in windows7. How can I proceed to achieve this. I want to see the differences in both the files , like we can use tkdiff in Linux.
Upvotes: 23
Views: 148558
Reputation: 2273
Notepad++ will give you a quick and dirty solution with the Compare plugin:
How to do it:
Voila!
If you want to compare two Excel files you can export the worksheets as CSV and do the above.
I did this to compare two HUGE CSV files, both around 400,000 rows and it worked like a charm.
Upvotes: 1
Reputation: 629
I had this issue and for fun I tried writing some powershell code to generate a "per column" diff. It is embarrassingly slow, but it solved my problem nicer (very wide CSV file, so regular diff was really hard to interpret). Call Compare-Csv
with the files you need
$ErrorActionPreference = "Stop"
function Compare-Csv
(
[Parameter(Mandatory)] [string] $ReferenceFile,
[Parameter(Mandatory)] [string] $DifferenceFile,
[string[]] $ReferenceIdentifiers = $null,
[char] $Delimiter = ';'
)
{
$referenceData = Import-Csv -ErrorAction 'Stop' -Delimiter $Delimiter $ReferenceFile
$differenceData = Import-Csv -ErrorAction 'Stop' -Delimiter $Delimiter $DifferenceFile
$referenceDataHeaders = [string[]] ($referenceData | Select-Object -First 1 | Get-Member -MemberType Properties | Select-Object -ExpandProperty Name)
$differenceDataHeaders = [string[]] ($differenceData | Select-Object -First 1 | Get-Member -MemberType Properties | Select-Object -ExpandProperty Name)
$supersetHeaders = ($referenceDataHeaders + $differenceDataHeaders) | Select-Object -Unique
$empty = @()
$fileDifferences = @()
$maxLength = ($referenceData.Length, $differenceData.Length | Measure-Object -Maximum).Maximum
for($i = 0; $i -lt $maxLength; $i++)
{
$ref = $empty;
if($i -lt $referenceData.Length)
{
$ref = $referenceData[$i]
}
$diff = $empty;
if($i -lt $differenceData.Length)
{
$diff = $differenceData[$i]
}
$rowDifferences = $null
foreach($header in $supersetHeaders)
{
$compare = Compare-Object -ReferenceObject $ref -DifferenceObject $diff -Property $header
if($compare)
{
if(-not $rowDifferences)
{
$rowDifferences = @{}
if($ReferenceIdentifiers)
{
$identifer = ($ref | Select-Object -Property $ReferenceIdentifiers).PSObject.Properties.Value
$rowDifferences.Add('ReferenceIdentifiers', $identifer)
}
}
$rowDifferences.Add($header, $compare)
}
}
if($rowDifferences)
{
$fileDifferences + $rowDifferences
}
}
return $fileDifferences
}
$differences = Compare-Csv -ReferenceFile 'Ref.csv' -DifferenceFile 'Diff.csv' -ReferenceIdentifiers @('ARRAY OF HEADER NAMES USED TAKEN FROM REFERENCE FILE THAT CAN BE USED TO IDENTIFY THE ROW')
foreach($difference in $differences)
{
$out = $difference.ReferenceIdentifiers + ": " + ($difference | Select-Object -ExcludeProperty ReferenceIdentifiers | Format-List | Out-String -NoNewline)
Write-Host ""
Write-Host $out
}
Upvotes: 1
Reputation: 1
You can also do the same thing with Powershell, which can be useful if you're in a read-only folder.
compare-object -ReferenceObject (Get-Content filename1.CSV) -DifferenceObject (Get-content filename2.CSV)
You can launch Powershell directly in your folder without navigating via cd, by typing powershell in the folder's address bar.
Upvotes: 0
Reputation: 294
Suggestion:
cmd
and press Enter to open a cd C:\path\to\your\directory
to reach the location of the two CSV files Tip: To paste a copied path from clipboard into DOS terminal cmd window, you can either (1) right-click on terminal window, or (2) press Shift+Insert.
fc filename1.csv filename2.csv > outfile.txt
(fc
stands for "file compare").outfile.txt
located in the same folder. If outfile.txt
doesn't exist, it will be created automatically.Upvotes: 27
Reputation: 43
I did this today.
Lets say we have 2 csv files X and Y
X having columns a, b, c
Y having column a, b, c
The rows are not in same order and are disperesed througout the csv files.
I imported both of them in my excel sheet. I sorted them first by column c and then by column b and then by column a. You can go in any order you like.
Compare the sorted files through notepad++'s compare plugin/Beyond Compare.
Upvotes: 4
Reputation: 9701
Here is another option which I found very useful, as mentioned here:
findstr /v /g:"file1.csv" "file2.csv"
Where the /v
switch returns the differences and /g:
gets the search strings from file1.csv. You can use findstr /?
for more help.
You can also print the differences to a file using:
findstr /v /g:"file1.csv" "file2.csv > diffs.csv"
As an aside, I found findstr
far more accurate and the output more readable than fc
.
UPDATE
This works nicely with 'smaller' files. You might get an out of memory
error on larger files. In this case, I've had to turn to Python and dataframes. Just a friendly heads up ...
Upvotes: 9