Nick Keets
Nick Keets

Reputation: 3398

Read mongodump output with go and mgo

I'm trying to read a collection dump generated by mongodump. The file is a few gigabytes so I want to read it incrementally.

I can read the first object with something like this:

buf := make([]byte, 100000)
f, _ := os.Open(path)
f.Read(buf)

var m bson.M
bson.Unmarshal(buf, &m)

However I don't know how much of the buf was consumed, so I don't know how to read the next one.

Is this possible with mgo?

Upvotes: 6

Views: 1778

Answers (4)

javier-sanz
javier-sanz

Reputation: 2524

Niks Keets' answer did not work for me. Somehow len(r.Data) was always the whole buffer length. So I came out with this other code:

for len(buff) > 0 {
    messageSize := binary.LittleEndian.Uint32(buff)
    err = bson.Unmarshal(buff, &myObject)
    if err != nil {
        panic(err)
    }

    // Do your stuff

    buff = buff[messageSize:]
}

Of course you have to handle truncated strucs at the end of the buffer. In my case I could load the whole file into memory.

Upvotes: 0

Nick Keets
Nick Keets

Reputation: 3398

I managed to solve it with the following code:

for len(buf) > 0 {
    var r bson.Raw
    var m userObject

    bson.Unmarshal(buf, &r)
    r.Unmarshal(&m)

    fmt.Println(m)

    buf = buf[len(r.Data):]
}

Upvotes: 2

lnmx
lnmx

Reputation: 11154

Using mgo's bson.Unmarshal() alone is not enough -- that function is designed to take a []byte representing a single document, and unmarshal it into a value.

You will need a function that can read the next whole document from the dump file, then you can pass the result to bson.Unmarshal().

Comparing this to encoding/json or encoding/gob, it would be convenient if mgo.bson had a Reader type that consumed documents from an io.Reader.

Anyway, from the source for mongodump, it looks like the dump file is just a series of bson documents, with no file header/footer or explicit record separators.

BSONTool::processFile shows how mongorestore reads the dump file. Their code reads 4 bytes to determine the length of the document, then uses that size to read the rest of the document. Confirmed that the size prefix is part of the bson spec.

Here is a playground example that shows how this could be done in Go: read the length field, read the rest of the document, unmarshal, repeat.

Upvotes: 5

Elwinar
Elwinar

Reputation: 9509

The method File.Read returns the number of bytes read.

File.Read

Read reads up to len(b) bytes from the File. It returns the number of bytes read and an error, if any. EOF is signaled by a zero count with err set to io.EOF.

So you can get the number of bytes read by simply storing the return parameters of you read:

n, err := f.Read(buf)

Upvotes: 3

Related Questions