Reputation: 3910
I'm try to use caffe
for audio recognition, but can't find a document for its input format.
I want to use leveldb
, thus I must create a key and a value for each record, which is a pair of label string and data byte array.
It seems that no document describes this, and after I found the value is written by Datum.SerializeToString()
, I can't find where Datum
is and then lost.
Does anyone know how to convert non-image records into leveldb
records for caffe
? Thanks!
Upvotes: 2
Views: 3283
Reputation: 1459
leveldb
, lmdb
and HDF5
are currently the main formats for feeding data into Caffe. The MemoryData
layer enable in-memory input as well, so it's possible to use whatever input format and and use Caffe's python or c++ interfaces to populate the data blobs.
If you're already set on leveldb
, this discussion on caffe issues could be useful.
Below is an example for populating a leveldb
with python. It requires pycaffe and plyvel. It's adapted from caffe's github issues posted by Zackory. It's not specific to images as long as you represent each example in the form of a CxHxW where any or all can be equal to 1:
import caffe
db = plyvel.DB('train_leveldb/', create_if_missing=True, error_if_exists=True, write_buffer_size=268435456)
wb = db.write_batch()
count = 0
for file in dataset:
mat = # load numpy array from file
# Load matrix into datum object
datum = caffe.io.array_to_datum(mat)
wb.put('%08d_%s' % (count, file), datum.SerializeToString())
count += 1
# Write to db in regular intervals
if count % 1000 == 0:
# Write batch of images to database
wb.write()
del wb
wb = db.write_batch()
# Write last batch of images
if count % 1000 != 0:
wb.write()
I find constructing lmdb a lot simpler. lmdb
example here.
Upvotes: 3
Reputation: 2277
The Datum object is defined with protobuf. See here: https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L30-L41
It generates a file caffe.pb.h
in .build_release/src/caffe/proto
with the class Datum
. You can have a look there to understand how this object works.
Upvotes: 1