Keras Gpu: Configuration

Question

I'm running simple dense layers, but Gpu load and Cpu load is low all the time. print(device_lib.list_local_devices())

2019-02-19 19:06:23.911633: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

2019-02-19 19:06:24.231261:Itensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.83 pciBusID: 0000:65:00.0 totalMemory: 8.00GiB freeMemory: 6.55GiB 2019-02-19 19:06:24.237952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-02-19 19:06:25.765790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-02-19 19:06:25.769303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-02-19 19:06:25.771334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-02-19 19:06:25.776384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 6288 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:65:00.0, compute capability: 7.5) [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 5007262859900510599 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 6594058650 locality { bus_id: 1 links { } } incarnation: 16804701769178738279 physical_device_desc: "device: 0, name: GeForce RTX 2080, pci bus id: 0000:65:00.0, compute capability: 7.5"

At leaset, it is working on GPU. But I don't know if this is max limit to proceduce this deep learning net in this GPU or not.

EDIT2: dataset

https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant

It's about 10000 datapoint and 4 description variables.

EDIT3: Code, it's really simple.

num_p = 8
model = Sequential()
model.add(Dense(8*num_p, input_dim=input_features, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(8*num_p, input_dim=input_features, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1, activation='linear'))
model.compile(loss='mae', optimizer='adam')

es = EarlyStopping(monitor='val_loss', min_delta=0.0005, patience=200, verbose=0, mode='min')
his = model.fit(x=X_train_scaled, y=y_train, batch_size=64, epochs=10000, verbose=0,
validation_split=0.2, callbacks=[es])

EDIT4: input data code

df = pd.read_csv("dataset")
X_train, X_test, y_train, y_test = 
train_test_split(df.iloc[:, :-1].values, df.iloc[:, -1].values)
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
batch_size = 64
dataset = tf.data.Dataset.from_tensor_slices((X_train_scaled, y_train))
print(dataset)
dataset = dataset.cache()
print(dataset)
dataset = dataset.shuffle(len(X_train_scaled))
print(dataset)
dataset = dataset.repeat()
print(dataset)
dataset = dataset.batch(batch_size)
print(dataset)
dataset = dataset.prefetch(batch_size*10)
print(dataset)

Sharky · Accepted Answer

You can increase GPU utilization by increasing batch size. However, considering rather small dataset size, performance can still be improved by using Dataset API. It's much more scalable solution, capable of handling large datasets.

dataset = tf.data.Dataset.from_tensor_slices((X_train_scaled, y_train))
dataset = dataset.cache() #caches dataset in memory
dataset = dataset.shuffle(len(X_train_scaled)) #shuffles dataset
dataset = dataset.repeat() #with no parameter, repeats indefinitely
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(batch_size*10) #prefetches data

Then you just pass dataset object to model.fit with no batch_size, cause it was specified earlier and with steps_per_epoch to let the model know the size of epoch.

his = model.fit(dataset, steps_per_epoch=7500, epochs=1000)

p.s. With csv file of this size it's hard to get high utilization rate. You can easily pass whole dataset as one batch and get about 60%. More info here https://www.tensorflow.org/guide/performance/datasets

Keras Gpu: Configuration

Answers (2)

Related Questions