Mallu Golageri
Mallu Golageri

Reputation: 59

akka stream alpakka csv: Stream is failing when the wrong number of columns read from the CSV file

I am reading a CSV file from a remote location(ftp) and file has an invalid number of columns.

Steam is not progressing when such rows encountered in the file. I need to skip them with an error message and proceed.

Here is what I have tried, Supervision strategy is not working.

source.via(CsvParsing.lineScanner() .withAttributes(ActorAttributes.supervisionStrategy(throwable -> Supervision.resume())))

I need to skip invalid row with an error message and proceed.

Sample Data: My Csv has 5 fields in each row.

1281,Export - Product Search Tags,0,Id,20
1282,Export - Product Search Tags,1,Id,10
1283,Export - Product Search Tags,2,Value,100

If I remove the last field in the 2nd row (i.e. 10). Then the stream will fail, it won't read the next line.

Upvotes: 0

Views: 620

Answers (1)

Diana
Diana

Reputation: 718

CsvParsing doesn't seem to have support for Actor supervision.

This is how the data is structured, see docs:

    Field Delimiter - separates the columns from each other (e.g. , or ;)
    Quote - marks columns that may contain other structuring characters (such as Field Delimiters or line break) (e.g. ")
    Escape Character - used to escape Field Delimiters in columns (e.g. \)
    Lines are separated by either Line Feed (\n = ASCII 10) or Carriage Return and Line Feed (\r = ASCII 13 + \n = ASCII 10).

In this case we expect the stream to fail with MalformedCsvException in the following cases, see internal CsvParser.scala class

1. Wrong escaping
2. No line end after the delimiter or when maximum line end is reached
3. No delimiter or end of line when quote end is reached
4. Unclosed quote

Please check when column is removed that none of these conditions are violated, and add the error the stream fails with to make the question more descriptive.

Upvotes: 2

Related Questions