Nugget
Nugget

Reputation: 60

how to label a multicolumn, multi category dataset and save it to a CSV or Parquet and train it using SVM

I am working with audio classification using OPENSMILE library. After preprocessing the audio data i am getting a 800x25 shaped data which is just for one file (each files is around 15 seconds long)

for training this dataset and for portability i want to convert to a CSV file, i have 5 folders in total (data/category_0 ... data/category_4) each category have around 50 files insides them (.wav) when processing one of the file data/category_0/audio1.wav after processing using Opensmile i am getting a pd data frame of shape (800x25) how should I train a multiclass classification when the data is like this

5 folders 20 files per folder each file's output is of shape (800x25)

import os
import time

import numpy as np
import pandas as pd

import audiofile
import opensmile

smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.eGeMAPSv02,
    feature_level=opensmile.FeatureLevel.LowLevelDescriptors,
    multiprocessing=True,
    verbose=True   
)
k = smile.process_signal(
    signal,
    sampling_rate
)

output for processing just one file

We can see there are multiple sub columns start and end how do i store this dataset properly into a csv file it its not possible how do i train them atleast

I tried to reshape the dimensions but still use and its messing up with the shape of the dataset

Upvotes: 0

Views: 44

Answers (0)

Related Questions