Firefly
Firefly

Reputation: 428

How to count duplicate entries in a .csv file?

I have a .csv file that is formated like this:

ID,date,itemName
456,1-4-2020,Lemon
345,1-3-2020,Bacon
345,1-4-2020,Sausage
123,1-1-2020,Apple
123,1-2-2020,Pineapple
234,1-2-2020,Beer
345,1-4-2020,Cheese

I have already implemented the algorithm to go through the file, scan for the first number and sort it in a descending order and make a new output:

123,1-1-2020,Apple
123,1-2-2020,Pineapple
234,1-2-2020,Beer
345,1-3-2020,Bacon
345,1-4-2020,Cheese
345,1-4-2020,Sausage
456,1-4-2020,Lemon

My question is, how do I implement my algorithm to make an output that counts the duplicate first number entries and reformat it to make it look like this...

123,1-1-2020,1,Apple
123,1-2-2020,1,Pineapple
234,1-2-2020,1,Beer
345,1-3-2020,1,Bacon
345,1-4-2020,2,Cheese,Sausage
456,1-4-2020,1,Lemon

...so that it counts the number of occurrence for each ID, denote it with the number of times, and if the date of that ID is also the same, combine the item names to the same line. Below is my source code (each line in the .csv is made into an object named 'receipt' that has ID, date, and name with their respective get() methods):

public class ReadFile {

    private static List<Receipt> readFile() {
        
        List<Receipt> receipts = new ArrayList<>();
        try {
            BufferedReader reader = new BufferedReader(new FileReader("dataset.csv"));

            // Move past the first title line
            reader.readLine();

            String line = reader.readLine();

            // Start reading from second line till EOF, split each string at ","
            while (line != null) {
                String[] attributes = line.split(",");
                Receipt attribute = getAttributes(attributes);
                receipts.add(attribute);
                line = reader.readLine();
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return receipts;
    }

    private static Receipt getAttributes(String[] attributes) {

        // Get ID located before the first ","
        long memberNumber = Long.parseLong(attributes[0]);

        // Get date located after the first ","
        String date = attributes[1];

        // Get name located after the second ","
        String name = attributes[2];

        return new Receipt(memberNumber, date, name);
    }

    // Parse the data into new file after sorting
    private static void parse(List<Receipt> receipts) {
        PrintWriter output = null;
        try {
            output = new PrintWriter("output.txt");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        // For each receipts, assert the text output stream is not null, print line.
        for (Receipt p : receipts) {
            assert output != null;
            output.println(p.getMemberNumber() + "," + p.getDate() + "," + p.getName());
        }
        assert output != null;
        output.close();
    }

    // Main method, accept input file, sort and parse
    public static void main(String[] args) {

        List<Receipt> receipts = readFile();
        QuickSort q = new QuickSort();
        q.quickSort(receipts);
        parse(receipts);
    }
}

Upvotes: 0

Views: 1175

Answers (1)

WJS
WJS

Reputation: 40034

The easiest way is to use a map.

Sample data from your file.

String[] lines = {
"123,1-1-2020,Apple",
"123,1-2-2020,Pineapple",
"234,1-2-2020,Beer",
"345,1-3-2020,Bacon",
"345,1-4-2020,Cheese",
"345,1-4-2020,Sausage",
"456,1-4-2020,Lemon"};
  • Create a map
  • as you read the lines, split them and add them to the map using the compute method. This will put the line in if the key (number and date) doesn't exist. Otherwise it simply appends the last item to the existing entry.
  • the file does not have to be sorted but the values will be added to the end as they are encountered.
Map<String, String> map = new LinkedHashMap<>(); 
for (String line : lines) {
    String[] vals = line.split(",");

    // if v is null, add the line
    // if v exists, take the existing line and append the last value
    map.compute(vals[0]+vals[1], (k,v)->v == null ? line : v +","+vals[2]);
}

for (String line : map.values()) {
    String[] fields = line.split(",",3);
    int count = fields[2].split(",").length;
    System.out.printf("%s,%s,%s,%s%n", fields[0],fields[1],count,fields[2]);
}

For this sample run prints

123,1-1-2020,1,Apple
123,1-2-2020,1,Pineapple
234,1-2-2020,1,Beer
345,1-3-2020,1,Bacon
345,1-4-2020,2,Cheese,Sausage
456,1-4-2020,1,Lemon

Upvotes: 1

Related Questions