Bernardo Olisan
Bernardo Olisan

Reputation: 675

same shape for audio dataset .wav files

So, I'm creating an ANN neural network type that can classify if the one that is talking is me or not, the problem is that I can train it because of the shape of my data.

X data is

(262144,)

y data is

(261768,)

How can I make my .wav audio files data the same shape?

Here is my full code

    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    import tensorflow as tf
    import numpy as np
    from scipy.io import wavfile
    from pathlib import Path
    import os

    ### DATASET 
    pathlist = Path(os.path.abspath('Voiceclassification/Data/me/')).rglob('*.wav')

    # My voice data
    for path in pathlist:
        filename = str(path)

        # convert audio to numpy array and then 2D to 1D np Array
        samplerate, data = wavfile.read(filename)
        #print(f"sample rate: {samplerate}")
        data = data.flatten()
        #print(f"data: {data}")

    pathlist2 = Path(os.path.abspath('Voiceclassification/Data/other/')).rglob('*.wav')

    # other voice data
    for path2 in pathlist2:
        filename2 = str(path2)

        samplerate2, data2 = wavfile.read(filename2)
        data2 = data2.flatten()
        #print(data2)


    ### ADAPTING THE DATA FOR THE MODEL
    X = data # My voice
    y = data2 # Other data
    #print(X.shape)
    #print(y.shape)

    ### Trainig the model
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)

    # Performing future scaling
    sc = StandardScaler()

    x_train = sc.fit_transform(x_train)
    x_test = sc.transform(x_test)

    ### Creating the ANN
    ann = tf.keras.models.Sequential()

    # First hidden layer of the ann
    ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
    # Second one
    ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
    # Output layer
    ann.add(tf.keras.layers.Dense(units=6, activation="sigmoid"))

    # Compile our neural network
    ann.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=['accuracy'])

    # Fit ANN
    ann.fit(x_train, y_train, batch_size=32, epochs=100)
    ann.save('train_model.model')

Any idea, in total I have 18 .wav files for each X or y

Upvotes: 0

Views: 585

Answers (1)

Bernardo Olisan
Bernardo Olisan

Reputation: 675

You can use scipy.io for wav files, rewrite the file with only 5 seconds long, I create this little code that will help you

def trim_wav( originalWavPath, newWavPath , start, new ):
   sampleRate, waveData = wavfile.read( originalWavPath )
   startSample = int( start * sampleRate )
   endSample = int( new * sampleRate )
   wavfile.write( newWavPath, sampleRate, waveData[startSample:endSample])

wp = "path of the wav file"
trim_wav(wp, wp.replace(".wav", ".wav"), 0,5)

This will crop your audio files and get rid of the millisecond that are not changing the shape of your data

Upvotes: 0

Related Questions