JavaLearner1
JavaLearner1

Reputation: 627

Splitting a String to match date format in a List in Java

I have a list of strings which I am going to write to a CSV file. The list elements has a String like this,

List<String> list1 = new ArrayList<String>();
list1.add("one, Aug 21, 2018 11:08:51 PDT, last");
list1.add("two, newlast, Aug 22, 2018 11:08:52 PDT");

But the problem is when I write to CSV file, "Aug 21" and "2018 11:08:51" gets separated into the different column.

I need it like "Aug 21, 2018 11:08:51 PDT".

Also, the index might change, it is not sure Aug 21 will always come at the same position in the list.

I tried the below code to fix this, It is Working. But is there any better way to fix this, (Instead of splitting to the array and iterating)

list1.forEach(s -> {
        String s1[] = s.split(",");
        for(int i=0; i<s1.length; i++) {
            if(isValidMonthDate(s1[i])==true) {
                if(s1[i+1]!=null && !s1[i+1].isEmpty()) {
                    if(isValidYearTime(s1[i+1])) {
                        s1[i] = s1[i].trim();
                        System.out.println("\""+ s1[i] +","+s1[i+1]+"\""); //i will concatenate this string and write to csv
                    }
                }
            }
        }
    });
}

public static boolean isValidMonthDate(String inDate) {
    SimpleDateFormat dateFormat = new SimpleDateFormat("MMM dd");       dateFormat.setLenient(false);
    try {
        dateFormat.parse(inDate.trim());
    } catch (ParseException pe) {
        return false;
    }
    return true;
}

public static boolean isValidYearTime(String inDate) {
    SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy HH:mm:ss zzz");        
            dateFormat.setLenient(false);
    try {
        dateFormat.parse(inDate.trim());
    } catch (ParseException pe) {
        return false;
    }
    return true;
}

I am able to get output,

"Aug 21, 2018 11:08:51 PDT"
"Aug 22, 2018 11:08:52 PDT"

Is there any better way to achieve this without splitting to aarray and iterating it.

Upvotes: 0

Views: 478

Answers (3)

mtj
mtj

Reputation: 3554

You could utilize the normal date parser to attempt parsing at each index using a parse position, and see where it succeeds.

As I try to ignore the old date api nowadays, here's a simple demo with the new one:

public static void main(String[] args) {
    List<String> inputs = Arrays.asList(
        "Aug 21, 2018 11:08:51 PDT",
        "one, Aug 21, 2018 11:08:51 PDT, last",
        "two, newlast, Aug 22, 2018 11:08:52 PDT"
        );
    String formatPattern = "MMM dd, yyyy HH:mm:ss zzz";
    DateTimeFormatter pattern = DateTimeFormatter.ofPattern(formatPattern, Locale.US);

    for(String input : inputs) {
        System.out.println("Processing " + input);

        int[] matchStartEnd = null;
        TemporalAccessor temp = null;

        // check all possible offsets i in the input string
        for(int i = 0, n = input.length() - formatPattern.length(); i <= n; i++) {
            try {
                ParsePosition pt = new ParsePosition(i);
                temp = pattern.parse(input, pt); 
                matchStartEnd = new int[] { i, pt.getIndex() };
                break;
            }
            catch(DateTimeParseException e) {
                // ignore this
            }
        }
        if(matchStartEnd != null) {
            System.out.println("  Found match at indexes " + matchStartEnd[0] + " to " + matchStartEnd[1]);
            System.out.println("  temporal accessor is " + temp);
        }
        else {
            System.out.println("  No match");
        }
    }
}

Upvotes: 1

Steven Spungin
Steven Spungin

Reputation: 29159

When output, put the date in quotes. That's how CSV escapes them.

To parse your input, use a regex. This one will read each date or word, and consume the comma separator

(\w{3} \d{1,2}, \d{4})|(\w+),?

You can elaborate with more parenthesis to pre-parse your date. If the first expression matches, it's the date. I will leave it to OP to order the final CSV.

Here the regex in Javascript for POC. I know the question is Java, but REGEX is same.

// read word or date followed by comma
const rx = /(\w{3} \d{1,2}, \d{4})|(\w+),?/g

const input = ['one, Aug 2, 1999, two', 'three, four, Aug 3, 2000', 'Aug 3, 2010, five, six']

let csv2 = ''

input.forEach(it => {
  let parts = []
  let m2 = rx.exec(it)
  while (m2) {
    parts.push(m2[1] || m2[2])
    m2 = rx.exec(it)
  }
  csv2 += parts.map(it => '"' + it + '"').join(',') + '\n'
})

console.log(csv2)

Upvotes: 0

Nikolas
Nikolas

Reputation: 44476

I suggest you to use Regex to extract the date:

^(.*?)(\w{3} \d{1,2}, \d{4} \d{2}:\d{2}:\d{2} PDT)(.*?)$

And Stream::map to extract the date and try to parse it. Don't forget to filter null values out since they didn't pass the parsing.

SimpleDateFormat sdf = new SimpleDateFormat("MMM dd, yyyy HH:mm:ss Z", Locale.ENGLISH);
list1.stream()
     .map(s -> { 
         try {
             return sdf.parse(s.replaceAll("^(.*?)(\\w{3} \\d{1,2}, \\d{4} \\d{2}:\\d{2}:\\d{2} PDT)(.*?)$", "$2")));
         } catch (ParseException e) {} return null; })
     .filter(Objects::nonNull)
     .forEach(System.out::println);

I suggest you wrap the try-catch and the Regex extracting into a separate method.

static SimpleDateFormat sdf = new SimpleDateFormat("MMM dd, yyyy HH:mm:ss Z", Locale.ENGLISH);

static Date validate(String date) {
    String s = date.replaceAll("^(.*?)(\\w{3} \\d{1,2}, \\d{4} \\d{2}:\\d{2}:\\d{2} PDT)(.*?)$", "$2");
    try {
        return sdf.parse(s);
    } catch (ParseException e) { }
    return null;
}

... which significantly simplifies the Stream:

list1.stream()
     .map(Main::validate)
     .filter(Objects::nonNull)
     .forEach(System.out::println);

Upvotes: 0

Related Questions