Reputation: 3398
I'm trying to read a collection dump generated by mongodump. The file is a few gigabytes so I want to read it incrementally.
I can read the first object with something like this:
buf := make([]byte, 100000)
f, _ := os.Open(path)
f.Read(buf)
var m bson.M
bson.Unmarshal(buf, &m)
However I don't know how much of the buf was consumed, so I don't know how to read the next one.
Is this possible with mgo?
Upvotes: 6
Views: 1778
Reputation: 2524
Niks Keets' answer did not work for me. Somehow len(r.Data)
was always the whole buffer length. So I came out with this other code:
for len(buff) > 0 {
messageSize := binary.LittleEndian.Uint32(buff)
err = bson.Unmarshal(buff, &myObject)
if err != nil {
panic(err)
}
// Do your stuff
buff = buff[messageSize:]
}
Of course you have to handle truncated strucs at the end of the buffer. In my case I could load the whole file into memory.
Upvotes: 0
Reputation: 3398
I managed to solve it with the following code:
for len(buf) > 0 {
var r bson.Raw
var m userObject
bson.Unmarshal(buf, &r)
r.Unmarshal(&m)
fmt.Println(m)
buf = buf[len(r.Data):]
}
Upvotes: 2
Reputation: 11154
Using mgo's bson.Unmarshal()
alone is not enough -- that function is designed to take a []byte
representing a single document, and unmarshal it into a value.
You will need a function that can read the next whole document from the dump file, then you can pass the result to bson.Unmarshal()
.
Comparing this to encoding/json
or encoding/gob
, it would be convenient if mgo.bson
had a Reader
type that consumed documents from an io.Reader
.
Anyway, from the source for mongodump, it looks like the dump file is just a series of bson documents, with no file header/footer or explicit record separators.
BSONTool::processFile shows how mongorestore reads the dump file. Their code reads 4 bytes to determine the length of the document, then uses that size to read the rest of the document. Confirmed that the size prefix is part of the bson spec.
Here is a playground example that shows how this could be done in Go: read the length field, read the rest of the document, unmarshal, repeat.
Upvotes: 5
Reputation: 9509
The method File.Read
returns the number of bytes read.
Read reads up to len(b) bytes from the File. It returns the number of bytes read and an error, if any. EOF is signaled by a zero count with err set to io.EOF.
So you can get the number of bytes read by simply storing the return parameters of you read:
n, err := f.Read(buf)
Upvotes: 3