Alexei Petrenko
Alexei Petrenko

Reputation: 93

Check duplicates in CSV parsed file

I have a parsed CSV file in such format:

const data = [
    ["ID", "Full name", "pHone", "Email", "Age", "Experience", "Yearly Income", "Has children", "License states", "Expiration date", "License number", "Duplicated With"],
    [1, "Alex Cho", "+18900991919", "[email protected]", "12", "21", "200", "FALSE", "AL | New York | District of Columbia | Montana", "12-12-2030", "1xr567", null],
    [2, "Alex Cho", "1900991919", "[email protected]", "0", "12", "true", "TRUE", "Alabama | American Samoa", "12/31/1998", "1xr567", null].
    [3, "Alex Cho", "8982394689", "[email protected]", "-1", "8", "1200.11", "FALSE", "Northern Mariana Islands", "date", "kas317", null],
    [4, "Alex Cho", "18900991919", "cho.cho", "-99", "100", "1200.100", "YES", "Palau", "02-11-2021", "1nasd567213", null],
    [5, "Alex Cho", "+18900991919", "[email protected]", "11", "11", "12..00.11", "NO", "Puerto Rico", "04-11-2021", "1xr567!(%^!@)", null],
    [6, "Alex Cho", "+18900991919", "@!%*!&@!@@gmail.com", "100", "10", "999999.11", " ", "West Virginia | North Carolina | North Dakota", "12/31/2022", "1xr*@#", null],
    [7, "Alex Cho", "+10950943225", "(*!&@^$%[email protected])", "44", "10", "12.00.11", "TRUE", "Virgin Islands", "  2022-12-03", "1xr___", null],
    [8, "Alex Cho", "+10950943225", "(*!&@^$%[email protected])", "44", "10", "12.00.11", "TRUE", "Virgin Islands", "  2022-12-03", "ABC123", null],
]

Now I need to check it for duplicates for email and phone if the phone or email will be the same in any items, I need to notice it and make a mark in my created last column "duplicated with". If you can see on the picture, there is an additional column and there must be an ID of duplicates.

But also, I don't know how to implement this. enter image description here

Upvotes: 0

Views: 1112

Answers (2)

codemonkey
codemonkey

Reputation: 7915

I hope I am understanding your requirements correctly. You need somethig like this it seems:

const data = [
    ["ID", "Full name", "pHone", "Email", "Age", "Experience", "Yearly Income", "Has children", "License states", "Expiration date", "License number", "Duplicated With"],
    ["1", "Alex Cho", "+18900991919", "[email protected]", "12", "21", "200", "FALSE", "AL | New York | District of Columbia | Montana", "12-12-2030", "1xr567"],
    ["2", "Alex Cho", "1900991919", "[email protected]", "0", "12", "true", "TRUE", "Alabama | American Samoa", "12/31/1998", "1xr567"],
    ["3", "Alex Cho", "8982394689", "[email protected]", "-1", "8", "1200.11", "FALSE", "Northern Mariana Islands", "date", "kas317"],
    ["4", "Alex Cho", "18933991919", "cho.cho", "-99", "100", "1200.100", "YES", "Palau", "02-11-2021", "1nasd567213"],
    ["5", "Alex Cho", "+18900991222", "[email protected]", "11", "11", "12..00.11", "NO", "Puerto Rico", "04-11-2021", "1xr567!(%^!@)"],
    ["6", "Alex Cho", "+18933991919", "@!%*!&@!@@gmail.com", "100", "10", "999999.11", " ", "West Virginia | North Carolina | North Dakota", "12/31/2022", "1xr*@#"],
]

const new_data = data.map((item, index) => {
    const clean_phone = item[2].replace(/^(\+1|^1)/,""); //Clean up the phone number
    const dup_ids = [];
    data.forEach((element, ind) => {
        if((index !== ind) && (element[2].includes(clean_phone) || item[3] === element[2]))
            dup_ids.push(element[0])
    })

    index && item.push(dup_ids);
    return item;
});
console.log(new_data)

This will give you the same array, but append an array of all duplicate "IDs" found by phone or email in the same array.

Upvotes: 2

DrGo
DrGo

Reputation: 521

you have at least two choices:

  1. sort the array by phone and email and then loop over the array comparing each entry with the next one. If they are the same flag the current record as duplicate to the previous one.

  2. using a map (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map), loop over the array (no need to sort it first) and check if the phone+email entry exists in the map. If it does, the current entry is duplicate and if not add to the map with a key= phone + email and value of the number of the record.

Upvotes: 0

Related Questions