Reputation: 881
I am trying to read some data in a JSON file in order to use it from lua. The data are sound files that have been preprocessed in python and stored in JSON for easier access.
The file is roughly 800Mb. When I try to read the read the entire file with file:read("*all")
, I get back a not enough memory
response. The libraries I have looked at are lua-json, lua-cjson and luajson. The first two don't provide a method to access files directly, the third one does, however is just a wrapper that calls f:read()
.
My ultimate goal is to use torch to train some models on some audio data, but I want to keep the processing of the raw signals in python. I chose JSON over other formats for convenience, so if you think there is a format that would work better, I am open for ideas.
Upvotes: 2
Views: 1936
Reputation: 164
Instead of using json, you could also try npy4th, and you can save data as "npz" file.
Another option is to use lutorpy, a library which allows you run lua/torch in python and provide convenient utilities for converting between numpy array and torch tensor, the advantage is memory copy or disk copy is not necessary, they share the underlying memory, so it's very fast. Check the website for more information.
A basic example:
import lutorpy as lua
import numpy as np
## use require("MODULE") to import lua modules
require("nn")
## run lua code in python with minimal modification: replace ":" to "._"
t = torch.DoubleTensor(10,3)
print(t._size()) # the corresponding lua version is t:size()
## or, you can use numpy array
xn = np.random.randn(100)
## convert the numpy array into torch tensor
xt = torch.fromNumpyArray(xn)
## convert torch tensor to numpy array
### Note: the underlying object are sharing the same memory, so the conversion is instant
arr = xt.asNumpyArray()
print(arr.shape)
Upvotes: 0
Reputation: 2266
You have two options:
Option 1: Install torch with Lua52 instead of LuaJIT. Nothing changes, everything works as expected, and you can now load your json file and decode it without memory issues. To do this:
cd ~/torch
./clean.sh
TORCH_LUA_VERSION=LUA52 ./install.sh
Option 2: Use HDF5 to save your python pre-processed files, and use torch-hdf5 to load them. HDF5 is much more suited for your data than JSON anyways.
Upvotes: 0
Reputation: 26794
I'm not sure json is the best format for storing audio data, but it seems like in this situation you'll need to write your own json parser that will read the file, parse the data, and pass them through your training process without storing the entire data set in memory.
Since the json format is fairly simple and you can limit the processing to just handle your format, it should be relatively straightforward to write SAX-like parser that will generate events you need. This SO answer may be a good starting point (or at least give you ideas on what keywords to search for).
Upvotes: 0