Reputation: 93
hello I am new to PyTorch and I want to make a simple speech recognition but I don't want to use pytorch.datasets I have some voices for dataset but I don't find anywhere to help me.
I want to use .wav files. I saw a tutorial but he used pytorch dataset.
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchaudio
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
from torchaudio.datasets import SPEECHCOMMANDS
import os
class SpeechSubset(SPEECHCOMMANDS):
def __init__(self, subset, str=None):
super().__init__("./", download=True)
def load_list(filename):
filepath = os.path.join(self._path, file.name)
with open(filepath) as fileob:
return [os.path.join(self._path, line.strip())]
if subset == "validation":
self._walker = load_list("validation_list.txt")
elif subset == "testing":
self._walker = load_list("testing_list.txt")
elif subset == "training":
excludes = load_list("validation_list.txt") + load_list("testing_list.txt")
excludes = set(excludes)
self._walker = [w for w in self._walker if w not in excludes]
train_set = SpeechSubset("training")
test_set = SpeechSubset("testing")
waveform, sample_rate, label, speaker_id, utterance_number = train_set[0]
sorry my english isn't too good.
EDIT
Im using the SPEECHCOMMANDS dataset but I want to use my own
thank you for reading.
Upvotes: 2
Views: 875
Reputation: 376
Since you are talking about the speech recognition and pytorch, I would recommend you to use a well-developed set of tools instead of doing speech-related training tasks from scratch.
A good repo on github is Espnet. It contains some quite recent work on text-to-speech and speech-to-text models as well as ready-to-use scripts to train on popular open-source dataset in different languages. It also includes trained models for you to use directly.
Back to your question, if you want to use pytorch to train your own speech recognition model on your own dataset, I would recommend you to go to this Espnet Librispeech ASR recipe. Although it uses .flac files, some little modifications on data preparation script and change some parameters in the major entry script asr.sh may feed your demand.
Note that, in addition to knowledge on python and torch, espnet needs you to be familiar with shell scripts as well. Their asr.sh script is quite long. This may not be an easy task for people who are more comfort with minimal pytorch codes for one specific model. Espnet is designed to accomodate many models and many datasets. It contains many preprocessing stages, e.g. speech feature extracting, length filtering, token preparation, language model training and so on, which are necessary for good speech recognition models.
If you insist on the repo that you found. You need to write a custom Dataset and Dataloader classes. You can refer to pytorch dataloading tutorial, but this link uses images as an example, if you want an audio example, maybe from some github repos like deepspeech pytorch dataloader
Upvotes: 1