Shiraaz.M
Shiraaz.M

Reputation: 3191

Parsing multi line records using java 8 streams

I'm trying to parse the following file which contains information in the following format:

TABLE_NAME

VARIABLE_LIST_OF_COLUMNS

VARIABLE_NUMBER_OF_ROWS (Seperated by a tab seperator)

An example (using ',' as the seperator for the question; actual seperator is a tab):

STUDENTS

ID

NAME

1,Mike

2,Kimberly

The idea is to build a list of insert sql statements (context for the code snippet).

What I want to know is whether this kind of multiline parsing is at all possible using java 8 streams API? This is what I have at the moment:

public final class StatementGeneratorMain {

    public static void main(final String[] args) throws Exception{
        List<String> fileNames = Arrays
            .asList("STUDENTS.txt");
        fileNames.stream()
            .forEach(fileName -> {
                String tableName;
                List<String> columnNames;
                List<String[]>  dataRows;
                try (BufferedReader br = getBufferedReader(fileName)) {
                    tableName = br.lines().findFirst().get();
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }

                try (BufferedReader br = getBufferedReader(fileName)) {
                    //skip the first line because its been processed.
                    columnNames = br.lines().skip(1).filter(v -> v.split("\t").length == 1).collect(toList());
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }

                try (BufferedReader br = getBufferedReader(fileName)) {
                    //skip the first line and the columns length to get the data
                    //columns are identified as being splittable on the delimiter
                    dataRows = br.lines().skip(1 + columnNames.size()).map(s -> s.split("\t"))
                        .collect(toList());
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }

                String columns = columnNames.stream().collect(joining(",","(",")"));

                List<String> dataRow = dataRows.stream()
                    .map(arr -> Arrays.stream(arr).map(x -> "'" + x + "'").collect(joining(",", "(", ")")))
                    .map(row -> String.format("INSERT INTO %s %s VALUES %s;", tableName, columns, row))
                    .collect(toList());

                dataRow.forEach(l -> System.out.println(l));
            });
    }

    private static BufferedReader getBufferedReader(String fileName) {
        return new BufferedReader(new InputStreamReader(StatementGeneratorMain.class.getClassLoader().getResourceAsStream(
            fileName)));
    }
}

This piece of code does the job for me, but I don't really like it because I read the same file thrice (once for table name, again to deduce the columns, again to get the rows). I also don't think that it is proper functional style.

What I am looking for is a more elegant way to do this kind of multiline/multirecord parsing using the streams API.

For completeness, the output is:

INSERT INTO STUDENTS (ID, NAME) VALUES ('1','Mike');

INSERT INTO STUDENTS (ID, NAME) VALUES ('2','Kimberly');

I'm not too particular about stuff like numeric column and null values at this point.

Upvotes: 1

Views: 2012

Answers (2)

Pshemo
Pshemo

Reputation: 124275

I am not sure if using streams is correct approach here since they ware meant to be used to iterate over data once, or to be more precise, handle data in one way. If you need to handle separate data chunks differently you should probably use good old loops or iterators. One of simplest solutions which comes to mind is using Scanner so your code can look like:

Pattern oneWordLine = Pattern.compile("^\\w+$", Pattern.MULTILINE);

List<String> files = Arrays.asList("input.txt");
for (String file : files) {

    try (Scanner sc = new Scanner(new File(file))) {

        String tableName = sc.nextLine();

        StringJoiner columnNamesJoiner = new StringJoiner(", ", "(", ")");
        // iterate over lines with single words
        while (sc.hasNext(oneWordLine)) {
            columnNamesJoiner.add(sc.nextLine());
        }
        String columns = columnNamesJoiner.toString();


        List<String> dataRow = new ArrayList<>();
        // iterate over rest of lines
        while (sc.hasNextLine()) {
            String values = Arrays.stream(sc.nextLine().split("\t")) 
                    .collect(joining("', '", "('", "')"));
            dataRow.add(String.format("INSERT INTO %s %s VALUES %s;", 
                    tableName,columns, values));
        }

        dataRow.forEach(System.out::println);

    } catch (Exception e) {
        e.printStackTrace();// no need to rethrow RuntimeEception
    }
}

Upvotes: 2

Ramzy
Ramzy

Reputation: 7138

You can move this piece "BufferedReader br = getBufferedReader(fileName)" to above, and read it as you required. I dont think, it is needed to read three times.

Upvotes: 0

Related Questions