Reputation: 27
I am trying to upload the stackexchange data dump in XML to SAS. It is in this particular format.
<?xml version="1.0" encoding="UTF-8"?>
-<votes>
<row CreationDate="2013-10-22T00:00:00.000" VoteTypeId="2" PostId="4" Id="1"/>
<row CreationDate="2013-10-22T00:00:00.000" VoteTypeId="16" PostId="1" Id="2"/>
<row CreationDate="2013-10-22T00:00:00.000" VoteTypeId="2" PostId="1" Id="3"/>
</votes>
I've tried using the default XML parsing scripts like xml and xml92 in SAS but the import has been unsuccessful.
libname Stackof xml 'C:\Users\abc\Documents\My SAS Files\Stackof\Votes.xml';
libname Stack 'C:\Users\abc\Documents\My SAS Files\Stack';
data stack.votes;
set stackof.votes;
run;
I was able to open the smaller files in Excel convert them into CSV and then upload them, but for large files(around 29 GB for the posts and votes data from stack overflow) what's the best way to go about it.
Upvotes: 2
Views: 1293
Reputation: 63424
To import an XML file like this, you should first create an XML Map. See SAS Documentation for that subject. You can create a map by hand (I've done it multiple times before) or you can use the XML Map utility that is bundled with SAS or available for download separately. Make sure you make the right map version for your SAS version, as later versions of SAS support more complex maps.
The map basically tells SAS what defines a dataset, what is a row, what is a column, and what datatypes each column is. This lets SAS know what goes where, otherwise it doesn't know where to put things.
Upvotes: 1