XYZ_Linux
XYZ_Linux

Reputation: 3597

How to compare two csv files in windows

I need to compare two csv files in windows7. How can I proceed to achieve this. I want to see the differences in both the files , like we can use tkdiff in Linux.

Upvotes: 23

Views: 148558

Answers (6)

pfeds
pfeds

Reputation: 2273

Notepad++ will give you a quick and dirty solution with the Compare plugin:

Comparing two CSV files with Notepad++

How to do it:

  • Plugins -> Plugins Admin... -> Find "Compare" and click "Install".
  • Open both CSV files.
  • Select the first file, then Plugins -> Compare -> Set as First Compare.
  • Select the second file, then Plugins -> Compare -> Compare.

Voila!

If you want to compare two Excel files you can export the worksheets as CSV and do the above.

I did this to compare two HUGE CSV files, both around 400,000 rows and it worked like a charm.

Upvotes: 1

Mads Ravn
Mads Ravn

Reputation: 629

I had this issue and for fun I tried writing some powershell code to generate a "per column" diff. It is embarrassingly slow, but it solved my problem nicer (very wide CSV file, so regular diff was really hard to interpret). Call Compare-Csv with the files you need

$ErrorActionPreference = "Stop"

function Compare-Csv
(
    [Parameter(Mandatory)] [string] $ReferenceFile,
    [Parameter(Mandatory)] [string] $DifferenceFile,
    [string[]] $ReferenceIdentifiers = $null,
    [char] $Delimiter = ';'
)
{
    $referenceData = Import-Csv -ErrorAction 'Stop' -Delimiter $Delimiter $ReferenceFile
    $differenceData = Import-Csv -ErrorAction 'Stop' -Delimiter $Delimiter $DifferenceFile
    $referenceDataHeaders = [string[]] ($referenceData | Select-Object -First 1 | Get-Member -MemberType Properties | Select-Object -ExpandProperty Name)
    $differenceDataHeaders = [string[]] ($differenceData | Select-Object -First 1 | Get-Member -MemberType Properties | Select-Object -ExpandProperty Name)
    $supersetHeaders = ($referenceDataHeaders + $differenceDataHeaders) | Select-Object -Unique

    $empty = @()
    $fileDifferences = @()
    $maxLength = ($referenceData.Length, $differenceData.Length | Measure-Object -Maximum).Maximum
    for($i = 0; $i -lt $maxLength; $i++)
    {
        $ref = $empty;
        if($i -lt $referenceData.Length)
        {
            $ref = $referenceData[$i]
        }

        $diff = $empty;
        if($i -lt $differenceData.Length)
        {
            $diff = $differenceData[$i]
        }

        $rowDifferences = $null
        foreach($header in $supersetHeaders)
        {
            $compare = Compare-Object -ReferenceObject $ref -DifferenceObject $diff -Property $header
            if($compare)
            {
                if(-not $rowDifferences)
                {
                    $rowDifferences = @{}
                    if($ReferenceIdentifiers)
                    {
                        $identifer = ($ref | Select-Object -Property $ReferenceIdentifiers).PSObject.Properties.Value
                        $rowDifferences.Add('ReferenceIdentifiers', $identifer)
                    }
                }

                $rowDifferences.Add($header, $compare)
            }
        }

        if($rowDifferences)
        {
            $fileDifferences + $rowDifferences
        }
    }

    return $fileDifferences
}

$differences = Compare-Csv -ReferenceFile 'Ref.csv' -DifferenceFile 'Diff.csv' -ReferenceIdentifiers @('ARRAY OF HEADER NAMES USED TAKEN FROM REFERENCE FILE THAT CAN BE USED TO IDENTIFY THE ROW')
foreach($difference in $differences)
{
    $out = $difference.ReferenceIdentifiers + ": " + ($difference | Select-Object -ExcludeProperty ReferenceIdentifiers | Format-List | Out-String -NoNewline)
    Write-Host ""
    Write-Host $out
}

Upvotes: 1

ajm.im
ajm.im

Reputation: 1

You can also do the same thing with Powershell, which can be useful if you're in a read-only folder.

compare-object -ReferenceObject (Get-Content filename1.CSV) -DifferenceObject (Get-content filename2.CSV)

You can launch Powershell directly in your folder without navigating via cd, by typing powershell in the folder's address bar.

Upvotes: 0

user5524993
user5524993

Reputation: 294

Suggestion:

  • Press Windows+R shortcut to open windows' Run prompt
  • Type in cmd and press Enter to open a DOS terminal cmd window
  • Change the current path by running the command cd C:\path\to\your\directory to reach the location of the two CSV files

Tip: To paste a copied path from clipboard into DOS terminal cmd window, you can either (1) right-click on terminal window, or (2) press Shift+Insert.

  • Finally, to compare the two files, run fc filename1.csv filename2.csv > outfile.txt (fc stands for "file compare").
    The command will also log the result of comparison into a text file outfile.txt located in the same folder. If outfile.txt doesn't exist, it will be created automatically.

Upvotes: 27

rohan_shedge
rohan_shedge

Reputation: 43

I did this today.

Lets say we have 2 csv files X and Y

X having columns a, b, c
Y having column a, b, c

The rows are not in same order and are disperesed througout the csv files.

I imported both of them in my excel sheet. I sorted them first by column c and then by column b and then by column a. You can go in any order you like.

Compare the sorted files through notepad++'s compare plugin/Beyond Compare.

Upvotes: 4

s3dev
s3dev

Reputation: 9701

Here is another option which I found very useful, as mentioned here:

findstr /v /g:"file1.csv" "file2.csv"

Where the /v switch returns the differences and /g: gets the search strings from file1.csv. You can use findstr /? for more help.

You can also print the differences to a file using:

findstr /v /g:"file1.csv" "file2.csv > diffs.csv"

As an aside, I found findstr far more accurate and the output more readable than fc.


UPDATE
This works nicely with 'smaller' files. You might get an out of memory error on larger files. In this case, I've had to turn to Python and dataframes. Just a friendly heads up ...

Upvotes: 9

Related Questions