Reputation: 99
I'm having issues parsing a Vimeo review page-formatted CSV in Adobe's ExtendScript. The problem is that ExtendScript is based off of ES3 and most solutions don't seem to work as they're based on modern JS.
Also the CSV has a header row, empty row at the end, double quotes for some but not all fields (which I'd like to remove) and potential line breaks and special characters (inc. commas) in the fields. Is there any way to get a 'clean' 2D array out?
I've tried solutions here: Javascript code to parse CSV data And here: How can I parse a CSV string with Javascript, which contains comma in data?
But couldn't get them to work, I assume the issues are related to ExtendScript being old.
CSV File
"Test Video-01.mp4",1,00:00:00,AVT,"test comment 1",--,"Tuesday, July 9, 2019 At 8:49 AM",No
"Test Video-01.mp4",2,00:00:00,AVT,"another at same timecode",--,"Tuesday, July 9, 2019 At 8:50 AM",Yes
,3,00:00:00,--,"another at same timecode","reply here from anon","Tuesday, July 9, 2019 At 8:54 AM",Yes
"Test Video-01.mp4",3,00:00:11,AVT,"really long comment Lorem ipsum dolor sit amet, Purus sit amet volutpat consequat mauris nunc congue nisi. Semper viverra nam libero justo laoreet sit amet cursus. Id interdum velit laoreet id. Bibendum est ultricies integer quis auctor elit sed vulputate. And some special chars to boot: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Eros donec ac odio tempor orci dapibus. Nam libero justo laoreet sit amet. Pellentesque pulvinar pellentesque habitant morbi. Pellentesque eu tincidunt tortor aliquam nulla facilisi cras fermentum.",--,"Tuesday, July 9, 2019 At 8:50 AM",No
"Test Video-01.mp4",4,00:00:19,AVT,"another one different timecode",--,"Tuesday, July 9, 2019 At 8:50 AM",Yes
"Test Video-01.mp4",5,00:00:43,AVT,"comment here tooo",--,"Tuesday, July 9, 2019 At 8:51 AM",No
,6,00:00:43,AVT,"comment here tooo","reply to a comment","Tuesday, July 9, 2019 At 8:51 AM",No
,7,00:00:43,AVT,"comment here tooo","reply again","Tuesday, July 9, 2019 At 8:51 AM",No
,8,00:00:43,"PJ Palomaki","comment here tooo","Different person reply","Tuesday, July 9, 2019 At 8:52 AM",No
,9,00:00:43,--,"comment here tooo","Anon reply reply","Tuesday, July 9, 2019 At 8:53 AM",No
"Test Video-01.mp4",6,00:01:29,--,"Anon comment",--,"Tuesday, July 9, 2019 At 8:53 AM",No
,7,00:01:29,--,"Anon comment","Anon reply","Tuesday, July 9, 2019 At 8:53 AM",No
,,,,,,,
If I parse with split("\n")
, fields with line breaks get split. If I use split(",")
any fields with commas get split.
Also, I'd like to include the parsing function in-line (in with the main script, rather than loading an external script) as I'd prefer to use a single file when deploying.
Thanks, PJ
Upvotes: 0
Views: 675
Reputation: 4131
For an Extendscript project, I used the BabyParse library.. I had to edit it a bit to be used in Extendscript. Here is the gist. It will give you a JSON object which you can transform into your 2D-Array.
Also, I'd like to include the parsing function in-line (in with the main script, rather than loading an external script) as I'd prefer to use a single file when deploying.
Use some build tool like gulp for it. Or you can use the Extendscript // @include "path/to/file.jsx"
or #include "path/to/file.jsx"
include syntax.
Then you can combine them using github.com/fabianmoronzirfas/extendscript-bundlr.
(All links are shameless self-promotion ;-))
Upvotes: 1
Reputation: 14537
As far as I can see the problem is this CSV file is not a valid CSV file at all. Specifically the "really long comment..." has line breaks and double quotes inside. They should be escaped somehow first. Right after that parsing becomes a trivial task.
So, the question actually is: what would be the best way to find and handle double quotes and line breaks within such texts to turn them into valid CSV data and then to turn them into 2D arrays?
I'm not sure that the task can be accomplished for any arbitrary texts. It's not unlikely the combination of unwanted double-quotes AND line breaks AND commas inside is an insurmountable obstacle.
Upvotes: 0