Reputation: 851
Need to sort a csv file based on the date column. This is how the masterRecords array list looks like
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL
I need to sort it out based from the date 07:15:00, 07:30:00, etc. I created a code to sort it out:
// Date is fixed on per 15min interval
ArrayList<String> sortDate = new ArrayList<String>();
sortDate.add(":00:");
sortDate.add(":15:");
sortDate.add(":30:");
sortDate.add(":45:");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempPath + filename));
for (int k = 0; k < sortDate.size(); k++) {
String date = sortDate.get(k);
for (int j = 0; j < masterRecords.size(); j++) {
String[] splitLine = masterRecords.get(j).split(",", -1);
if (splitLine[10].contains(date)) {
bw.write(masterRecords.get(j) + System.getProperty("line.separator").replaceAll(String.valueOf((char) 0x0D), ""));
masterRecords.remove(j);
}
}
}
bw.close();
You can see from above it will loop thru a first array (sortDate) and loop thru again on the second array which is the masterRecord and write it on a new file. It seems to be working as the new file is sorted out but I notice that my masterRecord has 10000 records but after creating a new file the record shrinks to 5000, Im assuming its how I remove the records from the master list. Anyone knows why?
Upvotes: 0
Views: 2850
Reputation: 339303
The accepted answer by Lautaro Cozzani is correct.
For fun here is an entirely different approach.
I used two libraries:
The Commons CSV library handles the parsing of various flavors of CSV. It can return a List of the rows from the file, each row being represented by their CSVRecord
object. You can ask that object for the first field, second field, and so on.
Joda-Time does the work of parsing the date-time strings.
Beware: Joda-Time refuses to try to parse the three-letter time zone code MYT
. For good reason: Those 3 or 4 letter codes are mere conventions, neither standardized nor unique. My example code below assumes all your data is using MYT
. My code assigns the proper time zone name xxx
. I suggest you enlighten whoever creates your input data to learn about proper time zone names and about ISO 8601 string formats.
My example code requires Java 8, using the new Lambda syntax and "streams".
This example does a double-layer sort. First the rows are sorted by the minute-of-hour (00, 15, 30, 45). Within each of those groups, the rows are sorted by the date-time value (ordered by year, month, day-of-month, and time-of-day).
First we open the .csv text file, and parse its contents into CSVRecord
objects.
String filePathString = "/Users/brainydeveloper/input.csv";
try {
Reader in = new FileReader( filePathString ); // Get the input file.
List<CSVRecord> recs = CSVFormat.DEFAULT.parse( in ).getRecords(); // Parse the input file.
Next we wrap those CSVRecord
objects each inside a smarter class that extracts the two values we care about: first the DateTime, secondly the minute-of-hour of that DateTime. See further down for the simple code of that class CsvRecWithDateTimeAndMinute
.
List<CsvRecWithDateTimeAndMinute> smartRecs = new ArrayList<>( recs.size() ); // Collect transformed data.
for ( CSVRecord rec : recs ) { // For each CSV record…
CsvRecWithDateTimeAndMinute smartRec = new CsvRecWithDateTimeAndMinute( rec ); // …transform CSV rec into one of our objects with DateTime and minute-of-hour.
smartRecs.add( smartRec );
}
Next we take that list of our smarter wrapped objects, and break that list into multiple lists. Each new list contains the CSV row data for a particular minute-of-hour (00, 15, 30, and 45). We store these in a map.
If our input data has only occurrences of those four values, the resulting map will have only four keys. Indeed, you can do a sanity-check by looking for more than four keys. Extra keys would mean either something went terribly wrong in parsing or there is some data with unexpected minute-of-hour values.
Each key (the Integer of those numbers) leads to a List of our smart wrapper objects. Here is some of that fancy new Lambda syntax.
Map<Integer , List<CsvRecWithDateTimeAndMinute>> byMinuteOfHour = smartRecs.stream().collect( Collectors.groupingBy( CsvRecWithDateTimeAndMinute::getMinuteOfHour ) );
The map does not give us our sub-lists with our keys (minute-of-hour Integers) sorted. We might get back the 15
group before we get the 00
group. So extract the keys, and sort them.
// Access the map by the minuteOfHour value in order. We want ":00:" first, then ":15", then ":30:", and ":45:" last.
List<Integer> minutes = new ArrayList<Integer>( byMinuteOfHour.keySet() ); // Fetch the keys of the map.
Collections.sort( minutes ); // Sort that List of keys.
Following along that list of ordered keys, ask the map for each key's list. That list of data needs to be sorted to get our second-level sort (by date-time).
List<CSVRecord> outputList = new ArrayList<>( recs.size() ); // Make an empty List in which to put our CSVRecords in double-sorted order.
for ( Integer minute : minutes ) {
List<CsvRecWithDateTimeAndMinute> list = byMinuteOfHour.get( minute );
// Secondary sort. For each group of records with ":00:" (for example), sort them by their full date-time value.
// Sort the List by defining an anonymous Comparator using new Lambda syntax in Java 8.
Collections.sort( list , ( CsvRecWithDateTimeAndMinute r1 , CsvRecWithDateTimeAndMinute r2 ) -> {
return r1.getDateTime().compareTo( r2.getDateTime() );
} );
for ( CsvRecWithDateTimeAndMinute smartRec : list ) {
outputList.add( smartRec.getCSVRecord() );
}
}
We are done manipulating the data. Now it is time to export back out to a text file in CSV format.
// Now we have complete List of CSVRecord objects in double-sorted order (first by minute-of-hour, then by date-time).
// Now let's dump those back to a text file in CSV format.
try ( PrintWriter out = new PrintWriter( new BufferedWriter( new FileWriter( "/Users/brainydeveloper/output.csv" ) ) ) ) {
final CSVPrinter printer = CSVFormat.DEFAULT.print( out );
printer.printRecords( outputList );
}
} catch ( FileNotFoundException ex ) {
System.out.println( "ERROR - Exception needs to be handled." );
} catch ( IOException ex ) {
System.out.println( "ERROR - Exception needs to be handled." );
}
The code above loads the entire CSV data set into memory at once. If wish to conserve memory, use the parse
method rather than getRecords
method. At least that is what the doc seems to be saying. I've not experimented with that, as my use-cases so far all fit easily into memory.
Here is that smart class to wrap each CSVRecord
object:
package com.example.jodatimeexperiment;
import org.apache.commons.csv.CSVRecord;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
/**
*
* @author Basil Bourque
*/
public class CsvRecWithDateTimeAndMinute
{
// Statics
static public final DateTimeFormatter FORMATTER = DateTimeFormat.forPattern( "MMM dd yyyy' - 'hh:mm:ss aa 'MYT'" ).withZone( DateTimeZone.forID( "Asia/Kuala_Lumpur" ) );
// Member vars.
private final CSVRecord rec;
private final DateTime dateTime;
private final Integer minuteOfHour;
public CsvRecWithDateTimeAndMinute( CSVRecord recordArg )
{
this.rec = recordArg;
// Parse record to extract DateTime.
// Expect value such as: Dec 15 2014 - 07:15:00 AM MYT
String input = this.rec.get( 7 - 1 ); // Index (zero-based counting). So field # 7 = index # 6.
this.dateTime = CsvRecWithDateTimeAndMinute.FORMATTER.parseDateTime( input );
// From DateTime extract minute of hour
this.minuteOfHour = this.dateTime.getMinuteOfHour();
}
public DateTime getDateTime()
{
return this.dateTime;
}
public Integer getMinuteOfHour()
{
return this.minuteOfHour;
}
public CSVRecord getCSVRecord()
{
return this.rec;
}
@Override
public String toString()
{
return "CsvRecWithDateTimeAndMinute{ " + " minuteOfHour=" + minuteOfHour + " | dateTime=" + dateTime + " | rec=" + rec + " }";
}
}
With this input…
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL
…you will get this output…
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL
Upvotes: 1
Reputation: 1808
Is not safe to remove an item inside of a loop. You have to iterate array over Iterator, for example:
List<String> names = ....
Iterator<String> i = names.iterator();
while (i.hasNext()) {
String s = i.next(); // must be called before you can call i.remove()
// Do something
i.remove();
}
The documentation says:
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Upvotes: 3