Reputation: 15058

Efficiently mapping one-to-many many-to-many database to struct in Golang

Question

When dealing with a one-to-many or many-to-many SQL relationship in Golang, what is the best (efficient, recommended, "Go-like") way of mapping the rows to a struct?

Taking the example setup below I have tried to detail some approaches with Pros and Cons of each but was wondering what the community recommends.

Requirements

Works with PostgreSQL (can be generic but not include MySQL/Oracle specific features)
Efficiency - No brute forcing every combination
No ORM - Ideally using only database/sql and jmoiron/sqlx

Example

For sake of clarity I have removed error handling

Models

type Tag struct {
  ID int
  Name string
}

type Item struct {
  ID int
  Tags []Tag
}

Database

CREATE TABLE item (
  id                      INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
);

CREATE TABLE tag (
  id                      INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  name                    VARCHAR(160),
  item_id                 INT REFERENCES item(id)
);

Approach 1 - Select all Items, then select tags per item

var items []Item
sqlxdb.Select(&items, "SELECT * FROM item")

for i, item := range items {
  var tags []Tag
  sqlxdb.Select(&tags, "SELECT * FROM tag WHERE item_id = $1", item.ID)
  items[i].Tags = tags
}

Pros

Simple
Easy to understand

Cons

Inefficient with the number of database queries increasing proportional with number of items

Approach 2 - Construct SQL join and loop through rows manually

var itemTags = make(map[int][]Tag)

var items = []Item{}
rows, _ := sqlxdb.Queryx("SELECT i.id, t.id, t.name FROM item AS i JOIN tag AS t ON t.item_id = i.id")
for rows.Next() {
  var (
    itemID  int
    tagID   int
    tagName string
  )
  rows.Scan(&itemID, &tagID, &tagName)
  if tags, ok := itemTags[itemID]; ok {
    itemTags[itemID] = append(tags, Tag{ID: tagID, Name: tagName,})
  } else {
    itemTags[itemID] = []Tag{Tag{ID: tagID, Name: tagName,}}
  }
}
for itemID, tags := range itemTags {
  items = append(Item{
    ID: itemID,
    Tags: tags,
  })
}

Pros

A single database call and cursor that can be looped through without eating too much memory

Cons

Complicated and harder to develop with multiple joins and many attributes on the struct
Not too performant; more memory usage and processing time vs. more network calls

Failed approach 3 - sqlx struct scanning

Despite failing I want to include this approach as I find it to be my current aim of efficiency paired with development simplicity. My hope was by explicitly setting the db tag on each struct field sqlx could do some advanced struct scanning

var items []Item
sqlxdb.Select(&items, "SELECT i.id AS item_id, t.id AS tag_id, t.name AS tag_name FROM item AS i JOIN tag AS t ON t.item_id = i.id")

Unfortunately this errors out as missing destination name tag_id in *[]Item leading me to believe the StructScan is not advanced enough to recursively loop through rows (no criticism - it is a complicated scenario)

Possible approach 4 - PostgreSQL array aggregators and GROUP BY

While I am sure this will not work I have included this untested option to see if it could be improved upon so it may work.

var items = []Item{}
sqlxdb.Select(&items, "SELECT i.id as item_id, array_agg(t.*) as tags FROM item AS i JOIN tag AS t ON t.item_id = i.id GROUP BY i.id")

When I have some time I will try and run some experiments here.

Upvotes: 47

Answers (5)

Saba

Reputation: 1541

Check the code below, having custom Scan with array_to_json(array_agg(tags)) works fine.

Models

type Tag struct {
    ID     int
    ItemID int
    Name   string
}

type Item struct {
    ID   int
    Tags TagList
}

type TagList []Tag

func (t *TagList) Scan(src any) error {
    return json.Unmarshal(src.([]byte), t)
}

Approach

rows, _ := db.Query(`
SELECT i.id, array_to_json(array_agg(t)) FROM items i
LEFT JOIN tags t ON t.item_id=i.id
GROUP BY i.id
`)
var items = []Item{}
for rows.Next() {
    var item = Item{
        Tags: []Tag{},
    }
    rows.Scan(&item.ID, &item.Tags)
    items = append(items, item)
}

Also might be more efficient to use jsonb_agg() than use array_to_json(array_agg(tags))

Upvotes: 3

VinGarcia

Reputation: 1205

I wrote a library trying to improve on some issues I find on the sqlx library and one of the improvements is exactly your use-case:

Library: github.com/vingarcia/ksql

Usage:

(I am also omitting error handling for sake of brevity)

type Tag struct {
  ID   int    `ksql:"id"`
  Name string `ksql:"name"`
}

type Item struct {
  ID   int   `ksql:"id"`
  Tags []Tag
}

// This is the target variable where we'll load the DB results:
var rows []struct{
    Item Item `tablename:"i"` // i is the alias for item on the query
    Tag  Tag  `tablename:"t"` // t is the alias for tag on the query
}
// When using the `tablename` above you need to start your query from `FROM`
// so KSQL can build the SELECT part based on the struct tags for you:
_ = ksqldb.Query(ctx, &rows, "FROM item AS i JOIN tag AS t ON t.item_id = i.id")

This would still not insert the tags on the Item.Tags attribute, so you would have to do it yourself, which could be complicated, not because of KSQL but because you would have multiple lines with the same ItemID, which would force you to use a map for deduplicating those.

In terms of performance there is another issue with this solution which is that you are literally loading the whole database in memory, this will probably cause your application to fail with an Out Of Memory error.

So a better alternative actually depends a lot on your precise use-case.

I will propose two solutions:

Solution 1

If the number of items isn't actually the whole database and having the smaller possible memory footprint is not a requirement, just accept having a bigger number of queries.

Also having shorter queries and not having to return the same IDs several times is actually good to your database and since the database is a shared resource, and possibly a single point of failure or bottleneck removing load from your DB and moving it to your microservices is often a good move:

// Defining a smaller struct so we don't use more memory than necessary:
var items []struct{
    ID int `ksql:"id"`
}
_ = ksqldb.Query(ctx, &items, "SELECT id FROM item WHERE some_criteria = $1", someCriteria)

for _, item := range items {
    var tags []Tags
    _ = ksqldb.Query(ctx, "FROM tags WHERE tags.id = $1", item.ID)
    
    // Do something with it as soon as possible,
    // so you don't have to keep it in memory:
    DoSomethingWithItem(Item{
        ID: item.ID,
        Tags: tags,
    })

    // Alternatively you can add it to a slice of items:
    completedItems = append(completedItems, Item{ID: item.ID, Tags: tags})
}

This solution is slower in total time, but mostly because of the number of round-trips between your microservice and the database. In terms of overall load on the network and on your database this is pretty much just as efficient, meaning this scales well. That said, if you are loading a small number of items the total time here should not be too significant anyway.

Solution 2

If you need to be as fast as possible on the microservice side and/or you need to have a very small memory footprint, i.e. never load lots of items in memory at a single time, then you should process it in chunks of data and KSQL also supports that:

type row struct{
    Item Item `tablename:"i"`
    Tag  Tag  `tablename:"t"`
}

// Here we are building each item one at a time, with all its tags, and
// then doing something with it as soon as we get to the next item.
//
// Note that for this to work we added a ORDER BY clause to the query.

var currentItem Item
_ = ksqldb.QueryChunks(ctx, ksql.ChunkParser{
    Query: "FROM item AS i JOIN tag AS t ON t.item_id = i.id ORDER BY i.id",
    ChunkSize: 100, // Load 100 rows at a time
    ForEachChunk: func(rows []row) error {
        for _, row := range rows {
            if currentItem.ID == 0 {
                currentItem = row.Item
            } else if row.Item.ID != currentItem.ID {
                // If we finished receiving one item:
                DoSomethingWithCurrentItem(currentItem)
                // Set the current item variable to the new Item:
                currentItem = row.Item
            }
            
            // Collect the tags of that item, one by one:
            currentItem.Tags = append(currentItem.Tags, rows.Tag)
        }
    },
})

// Do something with the last item you were parsing:
DoSomethingWithCurrentItem(currentItem)

I would not usually go with this approach as loading this much data at once in memory is very rarely a requirement, and this code is much more complex than just doing multiple queries. But if that is a requirement that's how I would do it. You can also do something similar with the rows.Next() implementation on the database/sql or sqlx libraries if you are not using KSQL.

Upvotes: -1

jackskj

Reputation: 31

You can use carta.Map() from https://github.com/jackskj/carta It tracks has-many relationships automatically.

Upvotes: 3

Tsingson Qin

Reputation: 304

the sql in postgres :

create schema temp;
set search_path = temp;
create table item
(
  id INT generated by default as identity primary key
);

create table tag
(
  id      INT generated by default as identity primary key,
  name    VARCHAR(160),
  item_id INT references item (id)
);

create view item_tags as
select id,
  (
          select
            array_to_json(array_agg(row_to_json(taglist.*))) as array_to_json
          from (
                select tag.name, tag.id
                 from tag
                         where item_id = item.id
               ) taglist ) as tags
from item ;


-- golang query this maybe 
select  row_to_json(row)
from (
    select * from item_tags
) row;

then golang query this sql:

select  row_to_json(row)
from (
    select * from item_tags
) row;

and unmarshall to go struct:

pro:

postgres manage the relation of data. add / update data with sql functions.
golang manage business model and logic.

it's easy way.

Upvotes: 18

Muhamed Keta

Reputation: 185

I can suggest another approach which I have used before.

You make a json of the tags in this case in the query and return it.

Pros: You have 1 call to the db, which aggregates the data, and all you have to do is parse the json into an array.

Cons: It's a bit ugly. Feel free to bash me for it.

type jointItem struct {
  Item 
  ParsedTags string
  Tags []Tag `gorm:"-"`
}

var jointItems []*jointItem
db.Raw(`SELECT 
  items.*, 
  (SELECT CONCAT(
            '[', 
             GROUP_CONCAT(
                  JSON_OBJECT('id', id,
                             'name', name 
                  )
             ), 
            ']'
         )) as parsed_tags 
   FROM items`).Scan(&jointItems)

for _, o := range jointItems {
var tempTags []Tag
   if err := json.Unmarshall(o.ParsedTags, &tempTags) ; err != nil {
      // do something
   }
  o.Tags = tempTags
}

Edit: code might behave weirdly so I find it better to use a temporary tags array when moving instead of using the same struct.

Upvotes: 7