SnehalP
SnehalP

Reputation: 69

Extract JSON as String from jsp

I am working on the parsing a website view-source:https://massive.ucsd.edu/ProteoSAFe/datasets.jsp. I want to parse the .jsp and extract the JSOn object from the same.

I am using Jsoup to extract the data

Document doc = Jsoup.connect("https://massive.ucsd.edu/ProteoSAFe/datasets.jsp").maxBodySize(0).get();

Then using Java pattern to extract Json as string:

Pattern p = Pattern.compile(String.format("\"%s\":\\s*(.*),", "dataset","\"%s\":\\s*(.*),", "datasetNum","\"%s\":\\s*(.*),", "title","\"%s\":\\s*(.*),", "user","\"%s\":\\s*(.*),", "site","\"%s\":\\s*(.*),", "flowname","\"%s\":\\s*(.*),", "createdMillis","\"%s\":\\s*(.*),", "created","\"%s\":\\s*(.*),", "fileCount","\"%s\":\\s*(.*),", "fileSizeKB","\"%s\":\\s*(.*),", "psms","\"%s\":\\s*(.*),", "peptides","\"%s\":\\s*(.*),", "variants","\"%s\":\\s*(.*),", "proteins","\"%s\":\\s*(.*),", "species","\"%s\":\\s*(.*),", "instrument","\"%s\":\\s*(.*),", "modification","\"%s\":\\s*(.*),", "pi","\"%s\":\\s*(.*),", "complete","\"%s\":\\s*(.*),", "status","\"%s\":\\s*(.*),", "private","\"%s\":\\s*(.*),", "hash","\"%s\":\\s*(.*),", "px","\"%s\":\\s*(.*),", "task","\"%s\":\\s*(.*),", "id"));

Matcher m = p.matcher(script.html());

While doing so I am getting error. Last line is not getting parsed correctly. It cuts in the end so I get

'A JSONObject text must end with '}' at character 577' error.

Can anyone suggest me better way to parse this page to get data.

Upvotes: 0

Views: 528

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191738

While it seems like a bad idea to parse any HTML with regex.

This works for me Pattern.compile("(?s)var datasets = (\\[.*?\\]);")

(Tested via Python, since that's all I have available).

And that returns a JSONArray, not a JSONObject.

Upvotes: 1

Related Questions