Reputation: 446
I am currently implementing an Akka Stream Scala application which reads-in a zipped file containing tweets formated as below (using json):
{"created_at": "Mon Nov 04 14:37:29 +0000 2019", ... }
{"created_at": "Mon Nov 04 14:37:29 +0000 2019", ... }
I already succeeded in reading in uncompressing the file but I'm now trying to split the stream into chucks in such a way that each chunck contains one representation of a tweet, which corresponds to one line in the code snippet above.
I have tried using the the following as a flow to achieve this:
Framing.delimiter(ByteString("\n"), 50000)
The problem is however that within the json there is an attribute "full_text", representing the content of the tweet. This text can contain \n characters, resulting in the above code snippet not working as it will also split at those \n text characters. Example below.
{"created_at": "Mon Nov 04 14:37:29 +0000 2019", "full_text": "I love to eat \n CHEESE!!", ... }
Does anyone know a good solution to this issue?
Upvotes: 1
Views: 175
Reputation: 2203
It seems that Akka‘s JSON Framing is made for this purpose:
https://doc.akka.io/docs/alpakka/current/data-transformations/json.html
Upvotes: 6