Reputation: 851

Java sort a csv file based on column date

Need to sort a csv file based on the date column. This is how the masterRecords array list looks like

GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014  - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014  - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014  - 07:30:00 AM MYT,+0,COMPL

I need to sort it out based from the date 07:15:00, 07:30:00, etc. I created a code to sort it out:

// Date is fixed on per 15min interval
ArrayList<String> sortDate = new ArrayList<String>();
    sortDate.add(":00:");
    sortDate.add(":15:");
    sortDate.add(":30:");
    sortDate.add(":45:");

    BufferedWriter bw = new BufferedWriter(new FileWriter(tempPath + filename));

    for (int k = 0; k < sortDate.size(); k++) {
        String date = sortDate.get(k);
        for (int j = 0; j < masterRecords.size(); j++) {
            String[] splitLine = masterRecords.get(j).split(",", -1);
            if (splitLine[10].contains(date)) {
                bw.write(masterRecords.get(j) + System.getProperty("line.separator").replaceAll(String.valueOf((char) 0x0D), ""));
                masterRecords.remove(j);
            }
        }
    }
bw.close();

You can see from above it will loop thru a first array (sortDate) and loop thru again on the second array which is the masterRecord and write it on a new file. It seems to be working as the new file is sorted out but I notice that my masterRecord has 10000 records but after creating a new file the record shrinks to 5000, Im assuming its how I remove the records from the master list. Anyone knows why?

Upvotes: 0

Answers (2)

Basil Bourque

Reputation: 339303

The accepted answer by Lautaro Cozzani is correct.

And Now for Something Completely Different

For fun here is an entirely different approach.

I used two libraries:

Apache Commons CSV
Joda-Time

Apache Commons CSV

The Commons CSV library handles the parsing of various flavors of CSV. It can return a List of the rows from the file, each row being represented by their CSVRecord object. You can ask that object for the first field, second field, and so on.

Joda-Time

Joda-Time does the work of parsing the date-time strings.

Avoid 3-letter Time Zone Codes

Beware: Joda-Time refuses to try to parse the three-letter time zone code MYT. For good reason: Those 3 or 4 letter codes are mere conventions, neither standardized nor unique. My example code below assumes all your data is using MYT. My code assigns the proper time zone name xxx. I suggest you enlighten whoever creates your input data to learn about proper time zone names and about ISO 8601 string formats.

Java 8

My example code requires Java 8, using the new Lambda syntax and "streams".

Example Code

This example does a double-layer sort. First the rows are sorted by the minute-of-hour (00, 15, 30, 45). Within each of those groups, the rows are sorted by the date-time value (ordered by year, month, day-of-month, and time-of-day).

First we open the .csv text file, and parse its contents into CSVRecord objects.

String filePathString = "/Users/brainydeveloper/input.csv";
try {
    Reader in = new FileReader( filePathString ); // Get the input file.
    List<CSVRecord> recs = CSVFormat.DEFAULT.parse( in ).getRecords(); // Parse the input file.

Next we wrap those CSVRecord objects each inside a smarter class that extracts the two values we care about: first the DateTime, secondly the minute-of-hour of that DateTime. See further down for the simple code of that class CsvRecWithDateTimeAndMinute.

    List<CsvRecWithDateTimeAndMinute> smartRecs = new ArrayList<>( recs.size() ); // Collect transformed data.
    for ( CSVRecord rec : recs ) { // For each CSV record…
        CsvRecWithDateTimeAndMinute smartRec = new CsvRecWithDateTimeAndMinute( rec ); // …transform CSV rec into one of our objects with DateTime and minute-of-hour.
        smartRecs.add( smartRec );
    }

Next we take that list of our smarter wrapped objects, and break that list into multiple lists. Each new list contains the CSV row data for a particular minute-of-hour (00, 15, 30, and 45). We store these in a map.

If our input data has only occurrences of those four values, the resulting map will have only four keys. Indeed, you can do a sanity-check by looking for more than four keys. Extra keys would mean either something went terribly wrong in parsing or there is some data with unexpected minute-of-hour values.

Each key (the Integer of those numbers) leads to a List of our smart wrapper objects. Here is some of that fancy new Lambda syntax.

    Map<Integer , List<CsvRecWithDateTimeAndMinute>> byMinuteOfHour = smartRecs.stream().collect( Collectors.groupingBy( CsvRecWithDateTimeAndMinute::getMinuteOfHour ) );

The map does not give us our sub-lists with our keys (minute-of-hour Integers) sorted. We might get back the 15 group before we get the 00 group. So extract the keys, and sort them.

    // Access the map by the minuteOfHour value in order. We want ":00:" first, then ":15", then ":30:", and ":45:" last.
    List<Integer> minutes = new ArrayList<Integer>( byMinuteOfHour.keySet() ); // Fetch the keys of the map.
    Collections.sort( minutes ); // Sort that List of keys.

Following along that list of ordered keys, ask the map for each key's list. That list of data needs to be sorted to get our second-level sort (by date-time).

    List<CSVRecord> outputList = new ArrayList<>( recs.size() ); // Make an empty List in which to put our CSVRecords in double-sorted order.
    for ( Integer minute : minutes ) {
        List<CsvRecWithDateTimeAndMinute> list = byMinuteOfHour.get( minute );
        // Secondary sort. For each group of records with ":00:" (for example), sort them by their full date-time value.
        // Sort the List by defining an anonymous Comparator using new Lambda syntax in Java 8.
        Collections.sort( list , ( CsvRecWithDateTimeAndMinute r1 , CsvRecWithDateTimeAndMinute r2 ) -> {
            return r1.getDateTime().compareTo( r2.getDateTime() );
        } );
        for ( CsvRecWithDateTimeAndMinute smartRec : list ) {
            outputList.add( smartRec.getCSVRecord() );
        }
    }

We are done manipulating the data. Now it is time to export back out to a text file in CSV format.

    // Now we have complete List of CSVRecord objects in double-sorted order (first by minute-of-hour, then by date-time).
    // Now let's dump those back to a text file in CSV format.
    try ( PrintWriter out = new PrintWriter( new BufferedWriter( new FileWriter( "/Users/brainydeveloper/output.csv" ) ) ) ) {
        final CSVPrinter printer = CSVFormat.DEFAULT.print( out );
        printer.printRecords( outputList );
    }

} catch ( FileNotFoundException ex ) {
    System.out.println( "ERROR - Exception needs to be handled." );
} catch ( IOException ex ) {
    System.out.println( "ERROR - Exception needs to be handled." );
}

The code above loads the entire CSV data set into memory at once. If wish to conserve memory, use the parse method rather than getRecords method. At least that is what the doc seems to be saying. I've not experimented with that, as my use-cases so far all fit easily into memory.

Here is that smart class to wrap each CSVRecord object:

package com.example.jodatimeexperiment;

import org.apache.commons.csv.CSVRecord;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

/**
 *
 * @author Basil Bourque
 */
public class CsvRecWithDateTimeAndMinute
{

    // Statics
    static public final DateTimeFormatter FORMATTER = DateTimeFormat.forPattern( "MMM dd yyyy'  - 'hh:mm:ss aa 'MYT'" ).withZone( DateTimeZone.forID( "Asia/Kuala_Lumpur" ) );

    // Member vars.
    private final CSVRecord rec;
    private final DateTime dateTime;
    private final Integer minuteOfHour;

    public CsvRecWithDateTimeAndMinute( CSVRecord recordArg )
    {
        this.rec = recordArg;
        // Parse record to extract DateTime.
        // Expect value such as: Dec 15 2014  - 07:15:00 AM MYT
        String input = this.rec.get( 7 - 1 );  // Index (zero-based counting). So field # 7 = index # 6.
        this.dateTime = CsvRecWithDateTimeAndMinute.FORMATTER.parseDateTime( input );
        // From DateTime extract minute of hour
        this.minuteOfHour = this.dateTime.getMinuteOfHour();
    }

    public DateTime getDateTime()
    {
        return this.dateTime;
    }

    public Integer getMinuteOfHour()
    {
        return this.minuteOfHour;
    }

    public CSVRecord getCSVRecord()
    {
        return this.rec;
    }

    @Override
    public String toString()
    {
        return "CsvRecWithDateTimeAndMinute{ " + " minuteOfHour=" + minuteOfHour + " | dateTime=" + dateTime + " | rec=" + rec + " }";
    }

}

With this input…

GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL

…you will get this output…

GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL

Upvotes: 1

Lautaro Cozzani

Reputation: 1808

Is not safe to remove an item inside of a loop. You have to iterate array over Iterator, for example:

List<String> names = ....
Iterator<String> i = names.iterator();
while (i.hasNext()) {
   String s = i.next(); // must be called before you can call i.remove()
   // Do something
   i.remove();
}

The documentation says:

The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Upvotes: 3