Reputation: 15058
When dealing with a one-to-many or many-to-many SQL relationship in Golang, what is the best (efficient, recommended, "Go-like") way of mapping the rows to a struct?
Taking the example setup below I have tried to detail some approaches with Pros and Cons of each but was wondering what the community recommends.
database/sql
and jmoiron/sqlx
For sake of clarity I have removed error handling
Models
type Tag struct {
ID int
Name string
}
type Item struct {
ID int
Tags []Tag
}
Database
CREATE TABLE item (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
);
CREATE TABLE tag (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
name VARCHAR(160),
item_id INT REFERENCES item(id)
);
Approach 1 - Select all Items, then select tags per item
var items []Item
sqlxdb.Select(&items, "SELECT * FROM item")
for i, item := range items {
var tags []Tag
sqlxdb.Select(&tags, "SELECT * FROM tag WHERE item_id = $1", item.ID)
items[i].Tags = tags
}
Pros
Cons
Approach 2 - Construct SQL join and loop through rows manually
var itemTags = make(map[int][]Tag)
var items = []Item{}
rows, _ := sqlxdb.Queryx("SELECT i.id, t.id, t.name FROM item AS i JOIN tag AS t ON t.item_id = i.id")
for rows.Next() {
var (
itemID int
tagID int
tagName string
)
rows.Scan(&itemID, &tagID, &tagName)
if tags, ok := itemTags[itemID]; ok {
itemTags[itemID] = append(tags, Tag{ID: tagID, Name: tagName,})
} else {
itemTags[itemID] = []Tag{Tag{ID: tagID, Name: tagName,}}
}
}
for itemID, tags := range itemTags {
items = append(Item{
ID: itemID,
Tags: tags,
})
}
Pros
Cons
Failed approach 3 - sqlx struct scanning
Despite failing I want to include this approach as I find it to be my current aim of efficiency paired with development simplicity. My hope was by explicitly setting the db
tag on each struct field sqlx
could do some advanced struct scanning
var items []Item
sqlxdb.Select(&items, "SELECT i.id AS item_id, t.id AS tag_id, t.name AS tag_name FROM item AS i JOIN tag AS t ON t.item_id = i.id")
Unfortunately this errors out as missing destination name tag_id in *[]Item
leading me to believe the StructScan
is not advanced enough to recursively loop through rows (no criticism - it is a complicated scenario)
Possible approach 4 - PostgreSQL array aggregators and GROUP BY
While I am sure this will not work I have included this untested option to see if it could be improved upon so it may work.
var items = []Item{}
sqlxdb.Select(&items, "SELECT i.id as item_id, array_agg(t.*) as tags FROM item AS i JOIN tag AS t ON t.item_id = i.id GROUP BY i.id")
When I have some time I will try and run some experiments here.
Upvotes: 47
Views: 21362
Reputation: 1541
Check the code below, having custom Scan
with array_to_json(array_agg(tags))
works fine.
Models
type Tag struct {
ID int
ItemID int
Name string
}
type Item struct {
ID int
Tags TagList
}
type TagList []Tag
func (t *TagList) Scan(src any) error {
return json.Unmarshal(src.([]byte), t)
}
Approach
rows, _ := db.Query(`
SELECT i.id, array_to_json(array_agg(t)) FROM items i
LEFT JOIN tags t ON t.item_id=i.id
GROUP BY i.id
`)
var items = []Item{}
for rows.Next() {
var item = Item{
Tags: []Tag{},
}
rows.Scan(&item.ID, &item.Tags)
items = append(items, item)
}
Also might be more efficient to use jsonb_agg()
than use array_to_json(array_agg(tags))
Upvotes: 3
Reputation: 1205
I wrote a library trying to improve on some issues I find on the sqlx
library and one of the improvements is exactly your use-case:
Library: github.com/vingarcia/ksql
Usage:
(I am also omitting error handling for sake of brevity)
type Tag struct {
ID int `ksql:"id"`
Name string `ksql:"name"`
}
type Item struct {
ID int `ksql:"id"`
Tags []Tag
}
// This is the target variable where we'll load the DB results:
var rows []struct{
Item Item `tablename:"i"` // i is the alias for item on the query
Tag Tag `tablename:"t"` // t is the alias for tag on the query
}
// When using the `tablename` above you need to start your query from `FROM`
// so KSQL can build the SELECT part based on the struct tags for you:
_ = ksqldb.Query(ctx, &rows, "FROM item AS i JOIN tag AS t ON t.item_id = i.id")
This would still not insert the tags on the Item.Tags
attribute, so you would have to do it yourself, which could be complicated, not because of KSQL but because you would have multiple lines with the same ItemID, which would force you to use a map for deduplicating those.
In terms of performance there is another issue with this solution which is that you are literally loading the whole database in memory, this will probably cause your application to fail with an Out Of Memory error.
So a better alternative actually depends a lot on your precise use-case.
I will propose two solutions:
If the number of items isn't actually the whole database and having the smaller possible memory footprint is not a requirement, just accept having a bigger number of queries.
Also having shorter queries and not having to return the same IDs several times is actually good to your database and since the database is a shared resource, and possibly a single point of failure or bottleneck removing load from your DB and moving it to your microservices is often a good move:
// Defining a smaller struct so we don't use more memory than necessary:
var items []struct{
ID int `ksql:"id"`
}
_ = ksqldb.Query(ctx, &items, "SELECT id FROM item WHERE some_criteria = $1", someCriteria)
for _, item := range items {
var tags []Tags
_ = ksqldb.Query(ctx, "FROM tags WHERE tags.id = $1", item.ID)
// Do something with it as soon as possible,
// so you don't have to keep it in memory:
DoSomethingWithItem(Item{
ID: item.ID,
Tags: tags,
})
// Alternatively you can add it to a slice of items:
completedItems = append(completedItems, Item{ID: item.ID, Tags: tags})
}
This solution is slower in total time, but mostly because of the number of round-trips between your microservice and the database. In terms of overall load on the network and on your database this is pretty much just as efficient, meaning this scales well. That said, if you are loading a small number of items the total time here should not be too significant anyway.
If you need to be as fast as possible on the microservice side and/or you need to have a very small memory footprint, i.e. never load lots of items in memory at a single time, then you should process it in chunks of data and KSQL also supports that:
type row struct{
Item Item `tablename:"i"`
Tag Tag `tablename:"t"`
}
// Here we are building each item one at a time, with all its tags, and
// then doing something with it as soon as we get to the next item.
//
// Note that for this to work we added a ORDER BY clause to the query.
var currentItem Item
_ = ksqldb.QueryChunks(ctx, ksql.ChunkParser{
Query: "FROM item AS i JOIN tag AS t ON t.item_id = i.id ORDER BY i.id",
ChunkSize: 100, // Load 100 rows at a time
ForEachChunk: func(rows []row) error {
for _, row := range rows {
if currentItem.ID == 0 {
currentItem = row.Item
} else if row.Item.ID != currentItem.ID {
// If we finished receiving one item:
DoSomethingWithCurrentItem(currentItem)
// Set the current item variable to the new Item:
currentItem = row.Item
}
// Collect the tags of that item, one by one:
currentItem.Tags = append(currentItem.Tags, rows.Tag)
}
},
})
// Do something with the last item you were parsing:
DoSomethingWithCurrentItem(currentItem)
I would not usually go with this approach as loading this much data at once in memory is very rarely a requirement, and this code is much more complex than just doing multiple queries. But if that is a requirement that's how I would do it. You can also do something similar with the rows.Next()
implementation on the database/sql
or sqlx
libraries if you are not using KSQL.
Upvotes: -1
Reputation: 31
You can use carta.Map() from https://github.com/jackskj/carta It tracks has-many relationships automatically.
Upvotes: 3
Reputation: 304
the sql in postgres :
create schema temp;
set search_path = temp;
create table item
(
id INT generated by default as identity primary key
);
create table tag
(
id INT generated by default as identity primary key,
name VARCHAR(160),
item_id INT references item (id)
);
create view item_tags as
select id,
(
select
array_to_json(array_agg(row_to_json(taglist.*))) as array_to_json
from (
select tag.name, tag.id
from tag
where item_id = item.id
) taglist ) as tags
from item ;
-- golang query this maybe
select row_to_json(row)
from (
select * from item_tags
) row;
then golang query this sql:
select row_to_json(row)
from (
select * from item_tags
) row;
and unmarshall to go struct:
pro:
postgres manage the relation of data. add / update data with sql functions.
golang manage business model and logic.
it's easy way.
.
Upvotes: 18
Reputation: 185
I can suggest another approach which I have used before.
You make a json of the tags in this case in the query and return it.
Pros: You have 1 call to the db, which aggregates the data, and all you have to do is parse the json into an array.
Cons: It's a bit ugly. Feel free to bash me for it.
type jointItem struct {
Item
ParsedTags string
Tags []Tag `gorm:"-"`
}
var jointItems []*jointItem
db.Raw(`SELECT
items.*,
(SELECT CONCAT(
'[',
GROUP_CONCAT(
JSON_OBJECT('id', id,
'name', name
)
),
']'
)) as parsed_tags
FROM items`).Scan(&jointItems)
for _, o := range jointItems {
var tempTags []Tag
if err := json.Unmarshall(o.ParsedTags, &tempTags) ; err != nil {
// do something
}
o.Tags = tempTags
}
Edit: code might behave weirdly so I find it better to use a temporary tags array when moving instead of using the same struct.
Upvotes: 7