codelidoo
codelidoo

Reputation: 219

Error using xpath expressions with eXist-db

I'm using xpath to query xml files containing shakespeare plays (I'm studying xpath). Now I want to know how many times juliet responses to romeo (speaks right after him). I was using this xpath expression:

1: count(doc('r_and_j.xml')//SPEAKER[. = "JULIET" and ../preceding-sibling::SPEECH[1]/SPEAKER = "ROMEO"])

yet this returns me 4, while this obviously can't be correct... This however does work:

2: count(doc('r_and_j.xml')//SPEECH[SPEAKER = "JULIET" and (preceding-sibling::SPEECH[1]/SPEAKER = "ROMEO")]

another query where thing go icky is the following: I want to know the titles of the acts that have no speakers in common with the next act in romeo and juliet.

3: doc('r_and_j.xml')//ACT[not(.//SPEAKER = ./following-sibling::ACT[1]//SPEAKER)]/TITLE

fails to deliver the correct result, while this one does:

4: doc('r_and_j.xml')//ACT[not(distinct-values(.//SPEAKER) = distinct-values(./following-sibling::ACT[1]//SPEAKER))]/TITLE

I don't see why xpath expressions 1,3 fail to deliver the answer, while 2,4 does? Could this have something to do with exist, since I was given 3 as a solution, while it does not seem to work.

Since it is hard to answer this (at least for 1,2) if you don't know the xml I'm working on, I will post the dtd here:

<!-- DTD for Shakespeare    J. Bosak    1994.03.01, 1997.01.02 -->
<!-- Revised for case sensitivity 1997.09.10 -->
<!-- Revised for XML 1.0 conformity 1998.01.27 (thanks to Eve Maler) -->

<!ENTITY amp "&#38;#38;">
<!ELEMENT PLAY     (TITLE, FM, PERSONAE, SCNDESCR, PLAYSUBT, INDUCT?,
                             PROLOGUE?, ACT+, EPILOGUE?)>
<!ELEMENT TITLE    (#PCDATA)>
<!ELEMENT FM       (P+)>
<!ELEMENT P        (#PCDATA)>
<!ELEMENT PERSONAE (TITLE, (PERSONA | PGROUP)+)>
<!ELEMENT PGROUP   (PERSONA+, GRPDESCR)>
<!ELEMENT PERSONA  (#PCDATA)>
<!ELEMENT GRPDESCR (#PCDATA)>
<!ELEMENT SCNDESCR (#PCDATA)>
<!ELEMENT PLAYSUBT (#PCDATA)>
<!ELEMENT INDUCT   (TITLE, SUBTITLE*, (SCENE+|(SPEECH|STAGEDIR|SUBHEAD)+))>
<!ELEMENT ACT      (TITLE, SUBTITLE*, PROLOGUE?, SCENE+, EPILOGUE?)>
<!ELEMENT SCENE    (TITLE, SUBTITLE*, (SPEECH | STAGEDIR | SUBHEAD)+)>
<!ELEMENT PROLOGUE (TITLE, SUBTITLE*, (STAGEDIR | SPEECH)+)>
<!ELEMENT EPILOGUE (TITLE, SUBTITLE*, (STAGEDIR | SPEECH)+)>
<!ELEMENT SPEECH   (SPEAKER+, (LINE | STAGEDIR | SUBHEAD)+)>
<!ELEMENT SPEAKER  (#PCDATA)>
<!ELEMENT LINE     (#PCDATA | STAGEDIR)*>
<!ELEMENT STAGEDIR (#PCDATA)>
<!ELEMENT SUBTITLE (#PCDATA)>
<!ELEMENT SUBHEAD  (#PCDATA)>

links to xml (and other plays beside romeo and juliet) here: http://metalab.unc.edu/bosak/xml/eg/shaks200.zip

Upvotes: 0

Views: 220

Answers (1)

alexbrn
alexbrn

Reputation: 2155

I don't know how you get 4 from the first query, since you are asking (in part) to find SPEAKER elements inside SPEAKER elements, and the DTD does not permit this.

I am using the XML play text available at http://www.ibiblio.org/xml/examples/shakespeare/

If you want to find all the speeches of Juliet preceded by speeches of R, then (let's build this up)

all the speeches:

//SPEECH (returns 841 elements)

all the speeches by Juliet:

//SPEECH[SPEAKER='JULIET'] (returns 118 elements)

and finally:

//SPEECH[SPEAKER='JULIET' and preceding-sibling::SPEECH[1][SPEAKER='ROMEO']] (returns 37 elements)

Your second task is quite challenging, but can be done using the = operator which, when comparing node sets, returns true if any value in the sets is shared, so:

//ACT[ following-sibling::ACT and not(.//SPEAKER = following-sibling::ACT[1]//SPEAKER)]/TITLE

Unsurprisingly, all adjacent Acts in the play have some speakers in common, so nothing is returned.

Upvotes: 1

Related Questions