colt.exe
colt.exe

Reputation: 728

Tensorboard event file size is growing after consecutive model training

I'm training 8 models in a for loop and saving each tensorboard log file into a seperate directory. Folder structure is like Graph is my main directory for graphs and directories under Graph such as net01 net02... net08 are the ones I'm outputting my event files. By doing this I can visualize training logs in Tensorboard in that fancy fashion with every single training process gets its own colour.

My problem is the growing sizes of eventfiles. The first event file is apporoximately 300KB's, but the second event file have a size of 600KB's, third is 900 KB and so on. They each reside in their own seperate directory and each of them are different training sessions from each other but somehow tensorboard appends the earlier sessions into last one. In the end I should've a total size of 12*300Kb= 3600 KB of session files, but I endup with something like 10800KB of session files. As the nets are getting deeper I endup with session file sizes of like 600 MB. So clearly I'm missing something out.

I tried to visualize last file with the biggest size to check whether it includes all the previous training sessions and can draw like 8 nets but it failed. SO a big bunch of irrelevant information is stored in this session file.

I'm using Anaconda3-Spyder on Win7-64. Database is divided into 8 and for each run I'm leaving one out for validation and using the rest as training. Here is a simplified version of my code:

from keras.models import Model
from keras.layers import Dense, Flatten, Input, Conv2D, MaxPooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import TensorBoard, ModelCheckpoint, CSVLogger
import os.path
import shutil
import numpy
#  ------------------------------------------------------------------
img_width, img_height = 48, 48
num_folds=8
folds_path= "8fold_folds"
nets_path = "8fold_nets_simplenet" 
csv_logpath = 'simplenet_log.csv'
nets_string = "simplenet_nets0"
nb_epoch = 50
batch_size = 512
cvscores = []
#%%
def foldpath(foldnumber):
    pathbase= os.path.join(folds_path,'F')
    train_data_dir = os.path.join(pathbase+str(foldnumber),"train")
    valid_data_dir = os.path.join(pathbase+str(foldnumber),"test")
    return train_data_dir,valid_data_dir

#%%
for i in range(1, num_folds+1):
    modelpath= os.path.join(nets_path,nets_string+str(i))
    if os.path.exists(modelpath):
        shutil.rmtree(modelpath)
    os.makedirs(modelpath)
    [train_data_dir, valid_data_dir]=foldpath(i)
    img_input = Input(shape=(img_width,img_height,1),name='input')

    x = Conv2D(32, (3,3), activation='relu', padding='same', name='conv1-'+str(i))(img_input)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='pool1-'+str(i))(x)
    x = Conv2D(64, (3,3), activation='relu', padding='same', name='conv2-'+str(i))(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='pool2-'+str(i))(x)
    x = Conv2D(128, (3,3), activation='relu', padding='same', name='conv3-'+str(i))(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='pool3-'+str(i))(x)
    x = Flatten()(x)
    x = Dense(512, name='dense1-'+str(i))(x)
    #x = Dropout(0.5)(x)
    x = Dense(512, name='dense2-'+str(i))(x)
    #x = Dropout(0.5)(x)
    predictions = Dense(6, activation='softmax', name='predictions-'+str(i))(x)
    model = Model(inputs=img_input, outputs=predictions)
    #  compile model-----------------------------------------------------------
    model.compile(optimizer='Adam', loss='binary_crossentropy', 
                  metrics=['accuracy'])
    #  ----------------------------------------------------------------
    # prepare data augmentation configuration
    train_datagen = ImageDataGenerator(rescale=1./255,
                                       featurewise_std_normalization=True,
                                       featurewise_center=True)
    valid_datagen = ImageDataGenerator(rescale=1./255)
    train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        color_mode='grayscale',
        classes = ['1','3','4','5','6','7'],
        class_mode='categorical',
        shuffle='False'
    )
    validation_generator = valid_datagen.flow_from_directory(
        valid_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        color_mode='grayscale',
        classes = ['1','3','4','5','6','7'],
        class_mode='categorical',
        shuffle='False'
    )
    #  --------------------callbacks---------------------------
    csv_logger = CSVLogger(csv_logpath, append=True, separator=';')
    graph_path = os.path.join('Graphs',modelpath)
    os.makedirs(graph_path)
    tensorboard = TensorBoard(log_dir= graph_path, write_graph=True, write_images=False)
    callbacks_list=[csv_logger,tensorboard]

    #  ------------------
    print("Starting to fit the model")

    model.fit_generator(train_generator,
                        steps_per_epoch = train_generator.samples/batch_size,
                        validation_data = validation_generator,
                        validation_steps = validation_generator.samples/batch_size,
                        epochs = nb_epoch, verbose=1, callbacks=callbacks_list)

Upvotes: 1

Views: 1259

Answers (2)

George V Jose
George V Jose

Reputation: 312

The problem is that with training of each model, the next model still contains all the graph elements of previous trainings. Thus before training each model, reset the Tensorflow graph and then continue with the training.

Upvotes: 0

kluu
kluu

Reputation: 2995

Not sure about this one but my guess would be that it has to do with your graphs being stored after each loop iteration. To check if your graphs are responsible for this, you could try write_graph = False, and see if you still have the same problem. To make sure the graph is reset, you could try to clear the tensorflow graph at the end of each iteration using this:

keras.backend.clear_session()  

Upvotes: 2

Related Questions