Reputation: 41
I have a CSV file with a non standardized content, it goes something like this:
John, 001
01/01/2015, hamburger
02/01/2015, pizza
03/01/2015, ice cream
Mary, 002
01/01/2015, hamburger
02/01/2015, pizza
John, 003
04/01/2015, chocolate
Now, what I'm trying to do is to write a logic in java to separate them.I would like "John, 001" as the header and to put all the rows under John, before Mary to be John's.
Will this be possible? Or should I just do it manually?
Edit:
For the input, even though it is not standardized, a noticeable pattern is that the row that do not have names will always starts with a date.
My output goal would be a java object, where I can store it in the database eventually in the format below.
Name, hamburger, pizza, ice cream, chocolate
John, 01/01/2015, 02/01/2015, 03/01/2015, NA
Mary, 01/01/2015, 02/01/2015, NA, NA
John, NA, NA, NA, 04/01/2015
Upvotes: 0
Views: 169
Reputation: 6289
Use uniVocity-parsers to handle this for you. It comes with a master-detail row processor.
// 1st, Create a RowProcessor to process all "detail" elements (dates/ingredients)
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();
// 2nd, Create MasterDetailProcessor to identify whether or not a row is the master row (first value of the row is a name, second is an integer).
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.TOP, detailProcessor) {
@Override
protected boolean isMasterRecord(String[] row, ParsingContext context) {
try{
//tries to convert the second value of the row to an Integer.
Integer.parseInt(String.valueOf(row[1]));
return true;
} catch(NumberFormatException ex){
return false;
}
}
};
CsvParserSettings parserSettings = new CsvParserSettings();
// Set the RowProcessor to the masterRowProcessor.
parserSettings.setRowProcessor(masterRowProcessor);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader(yourFile));
// Here we get the MasterDetailRecord elements.
List<MasterDetailRecord> rows = masterRowProcessor.getRecords();
// Each master record has one master row and multiple detail rows.
MasterDetailRecord masterRecord = rows.get(0);
Object[] masterRow = masterRecord.getMasterRow();
List<Object[]> detailRows = masterRecord.getDetailRows();
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
Upvotes: 0
Reputation: 148880
If I have correctly understood, the specs are :
The algorithm is in pseudo code :
Data structures :
one list of struct< string name, hash< int meal index, date> > for the names : base
one list of strings for the meals : meals
Code :
name = null
iname = -1
Loop per input lines {
if first field is date {
if name == null {
throw Exception("incorrect structure");
}
meal = second field
look for index of meal in meals
if not found {
index = len(meals);
add meal at end of list meals
}
base[iname].hash[index] = date
}
else {
name = first field
iname += 1
add a new struc { name, empty hash } at end of list base
}
}
close input file
open output file
// headers
print "names"
for meal in meals {
print ",", meal
}
print newline
for (i=0; i<=iname; i++) {
print base[i].name
for meal in meals {
look for meal in base[i].hash.keys
if found {
print ",", base[i].hash[meal]
}
else {
print ",NA"
}
}
print newline
}
close output file
Just code it in correct Java and come back here if you have any problem.
Upvotes: 2
Reputation: 224
You could just read the file into a list
List<String> lines = Files.readAllLines(Paths.get(path), StandardCharsets.UTF_8);
Afterwards iterate over the list and split them for wanted delimiters (",")
.
Now you could just use if-else or switch blocks to check for specific entries.
List<DataObject> objects = new ArrayList<>();
DataObject dataObject = null;
for(String s : lines) {
String [] splitLine = s.split(",");
if(splitLine[0].matches("(\d{2}\/){2}\d{4}")) {
// We found a data
if(dataObject != null && splitLine.length == 2) {
String date = splitLine[0];
String dish = splitLine[1];
dataObject.add(date, dish);
} else {
// Handle error
}
} else if(splitLine.length == 2) {
// We can create a new data object
if(dataObject != null) {
objects.add(dataObject);
}
String name = splitLine[0];
String id = splitLine[1];
dataObject = new DataObject(name, id);
} else {
// Handle error
}
}
Now you can sort them into your specific categories.
Edit: Changed the loop and added a regex (which may not be optimal) for matching the date strings and using them to decide whether to add them to the last data object.
The DataObject class can contain data structures holding the dates/dishes. When the CSV is parsed you can iterate over the objects List and do whatever you want. I hope this answer helps :)
Upvotes: 2