Maggie Mi
Maggie Mi

Reputation: 89

Excel---Counting duplicate data

APC NM000038
APC NM000038
APC NM000038
APC NM000038
APC NM001127510
APC NM001127510
APC NM001127510

I was trying to count duplicate data in excel. I cant find relevant answers, so I just ask here. And thank you for answering in advance.

I am trying to countif - with two columns of information. if the values in column A = certain criteria, i want to count the number of times a value appears in Column B (Do not count the same value from column B, only count the different values) - not all values in column A correspond to column b values. So if I want to see how many females received a "5", I need to see how many "female" in column A have a corresponding value of "5" in column B. (not all females have a "5", values can be 1 through 5)

In the above example, I expect

NM001127510  2
NM000038     2

Upvotes: 1

Views: 4597

Answers (1)

Siddharth Rout
Siddharth Rout

Reputation: 149335

Assuming that you data is in cell A1 to B1000, use this formula

=SUMPRODUCT((A1:A1000="APC")*(B1:B1000="NM000038"))

similarly

=SUMPRODUCT((A1:A1000="APC")*(B1:B1000="NM001127510"))

FOLLOWUP

I did that, but the data sets are huge, several hundred MB excel file. I m not sure whether i need to use vba to do that, cuz i dont know a lot about vba – Maggie Mi 7 hours ago

If you Excel Files are >= 100 mb then opening it and then running the VBA code or using formulas will take a lot of time. If you are just concerned with the results then try this :)

Open a new Excel File. Ensure your other file is closed. In a module paste this code. When you run this, the output will be pasted in Sheet1.

Option Explicit

Sub Sample()
    Dim ws As Worksheet
    Dim adodb As Object
    Dim result

    Set ws = Sheets("Sheet1")

    Set adodb = CreateObject("ADODB.Connection")

    adodb.CursorLocation = 3

    adodb.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & _
                  "C:\book1.xlsx" & ";Extended Properties=""Excel 12.0 Xml;IMEX=1" & _
                  "HDR=NO;" & """"

    Set result = adodb.Execute("Select F1, F2, Count(*) from [Sheet1$] Group by F1,F2")

    With ws
        .Range(.Cells(1, 1), .Cells(result.RecordCount, result.Fields.Count)) _
        = Application.Transpose(result.GetRows)
    End With

    '~~> Cleanup
    result.Close
    adodb.Close
    Set adodb = Nothing
    Set result = Nothing
End Sub

MY ASSUMPTIONS (Change the above code - My code is based on below assumptions)

1) You are working with Excel 2007/2010 files. If not, then you will have to change the connection string. Please see this link for an appropriate connection string.

http://connectionstrings.com/excel

2) The name of your Excel file which has the data is called Book1.xlsx and the data is in Sheet1. Also it resides in C:\

SNAPSHOT

enter image description here

Upvotes: 3

Related Questions