Judah Flynn
Judah Flynn

Reputation: 544

xquery group by not elimilate duplicated items

I have an XML document,

<resultsets>
    <row>
        <first_name>Georgi</first_name>
        <last_name>Facello</last_name>
    </row>
    <row>
        <first_name>Bezalel</first_name>
        <last_name>Simmel</last_name>
    </row>
    <row>
        <first_name>Bezalel</first_name>
        <last_name>Hass</last_name>
    </row>
</resultsets>

I want to sort first names and remove duplicated first names to produce this:

<resultsets>
    <row>
        <first_name>Bezalel</first_name>
        <last_name>Simmel</last_name>
    </row>
    <row>
        <first_name>Georgi</first_name>
        <last_name>Facello</last_name>
    </row>
</resultsets>

Following are the code I wrote:

for $last_name at $count1 in doc("employees.xml")//last_name,
$first_name at $count2 in doc("employees.xml")//first_name
let $f := $first_name
where ( $count1=$count2 )
group by $f
order by $f
return 
<row> 
     {$f}
     {$last_name}
</row>

However, this code sort the XML document by first names, but failed to remove the duplicated first name ('Bezalel'), it returns:

   <resultsets>
        <row>
            <first_name>Bezalel</first_name>
            <last_name>Simmel</last_name>
        </row>
        <row>
            <first_name>Bezalel</first_name>
            <last_name>Hass</last_name>
        </row>
        <row>
            <first_name>Georgi</first_name>
            <last_name>Facello</last_name>
        </row>
    </resultsets>

I know how to solve this using two FLOWR statements. group by behavior is weird, could you please explain why it does not remove the duplicates? Is there any way we can solve this problem using ONE FLOWR loop and ONLY use $first_name and $last_name two variables? Thanks,

Upvotes: 1

Views: 57

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167571

I would simply group the row elements by the first_name child and then output the first item in each group to ensure you don't get duplicates:

<resultssets>
{
    for $row in resultsets/row
    group by $fname := $row/first_name
    order by $fname
    return
        $row[1]    
}
</resultssets>

http://xqueryfiddle.liberty-development.net/jyyiVhf

As to how the group by clause works, see https://www.w3.org/TR/xquery-31/#id-group-by which says:

The group by clause assigns each pre-grouping tuple to a group, and generates one post-grouping tuple for each group. In the post-grouping tuple for a group, each grouping key is represented by a variable that was specified in a GroupingSpec, and every variable that appears in the pre-grouping tuples that were assigned to that group is represented by a variable of the same name, bound to a sequence of all values bound to the variable in any of these pre-grouping tuples.

Upvotes: 2

Related Questions