Reputation: 1479
I'm using Scales XML and Scala.
I'm trying to read MediaWiki XML format and it starts like this:
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/ http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="en">
Then under the tag there are a bunch of tags, some of which have a redirect tag such as:
<page>
<title>Albigensian</title>
<redirect title="Catharism" />
<revision>
...
</revision>
</page>
I'm using ScalesXML to do the parsing:
object WikiMediaImport extends App with Logging {
val xml = pullXml(new FileReader(args(0)))
val ns = Namespace("http://www.mediawiki.org/xml/export-0.8/")
val p = ns // .prefixed("mediawiki") <-- that doesn't help either
val mediawikiTag = p("mediawiki")
val pageTag = p("page")
val titleTag = p("title")
val revisionTag = p("revision")
val textTag = p("text")
val timestampTag = p("timestamp")
val redirectTag = p("redirect")
//val redirectWhereAttr: Attribute = Attribute(redirectTag, "title")
val pagePath = List(mediawikiTag, pageTag)
val iterator = iterate(pagePath, xml)
for {
page <- iterator
} {
val title = text(page \* titleTag)
val timestamp = text(page \* revisionTag \* timestampTag)
val content = text(page \* revisionTag \* textTag)
println(s"$title $timestamp ${content.length}")
}
}
However, I also want to get the mediawiki -> page -> redirect[title]
attribute value and I'm not quite sure how to do this despite reading the help page.
If I get a prefixed Namespace, then nothing is found because in the file the namespace isn't actually prefixed. If I use NoNamespaceQName then nothing is found (presumably because in reality the XML file has a namespace specified).
And if I use a default Namespace then Scales doesn't allow me to define an attribute because those are only to be used with prefixed namespaces.
At least that's how I understand that.
Upvotes: 1
Views: 341
Reputation: 1479
Found the answer, needed to use *@
operator with a simple String to access the attribute:
val redirectWhereAttr = "title".l
...
val redirectWhere = text(page \* redirectTag *@ redirectWhereAttr)
Upvotes: 0
Reputation: 2811
Accessing an attribute requires an AttributeQName (NoNamespaceQName or a PrefixedQName). The simple string version of *@ only compares on localName and it's a mistake that I'll deprecate, as is the UnprefixedQName overloaded version. They help to make querys simpler to build but they aren't correct to the specs types and will likely bite someone (including me) later on.
\@name is a NoNamespaceQName (as is the redirect title attribute above). \@p:name is a PrefixedQName and the only other way to specify is via predicates, which scales should also do.
Edit to add the actual "not to be deprecated in the future" answer, sorry :) - As this attribute is not namespaced you should use the "l" function on the string Function Docs here or create a NoNamespaceQName directly via NoNamespaceQName(localString) and use that. See more here on the implicits and QNames in general
Upvotes: 2