AutomaKen
AutomaKen

Reputation: 17

Keras Gpu: Configuration

I'm running simple dense layers, but Gpu load and Cpu load is low all the time. enter image description here enter image description here print(device_lib.list_local_devices())

2019-02-19 19:06:23.911633: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

2019-02-19 19:06:24.231261:Itensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.83 pciBusID: 0000:65:00.0 totalMemory: 8.00GiB freeMemory: 6.55GiB 2019-02-19 19:06:24.237952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-02-19 19:06:25.765790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-02-19 19:06:25.769303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-02-19 19:06:25.771334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-02-19 19:06:25.776384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 6288 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:65:00.0, compute capability: 7.5) [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 5007262859900510599 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 6594058650 locality { bus_id: 1 links { } } incarnation: 16804701769178738279 physical_device_desc: "device: 0, name: GeForce RTX 2080, pci bus id: 0000:65:00.0, compute capability: 7.5"

At leaset, it is working on GPU. But I don't know if this is max limit to proceduce this deep learning net in this GPU or not.

EDIT2: dataset

https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant

It's about 10000 datapoint and 4 description variables.

EDIT3: Code, it's really simple.

num_p = 8
model = Sequential()
model.add(Dense(8*num_p, input_dim=input_features, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(8*num_p, input_dim=input_features, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1, activation='linear'))
model.compile(loss='mae', optimizer='adam')

es = EarlyStopping(monitor='val_loss', min_delta=0.0005, patience=200, verbose=0, mode='min')
his = model.fit(x=X_train_scaled, y=y_train, batch_size=64, epochs=10000, verbose=0,
validation_split=0.2, callbacks=[es])

EDIT4: input data code

df = pd.read_csv("dataset")
X_train, X_test, y_train, y_test = 
train_test_split(df.iloc[:, :-1].values, df.iloc[:, -1].values)
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
batch_size = 64
dataset = tf.data.Dataset.from_tensor_slices((X_train_scaled, y_train))
print(dataset)
dataset = dataset.cache()
print(dataset)
dataset = dataset.shuffle(len(X_train_scaled))
print(dataset)
dataset = dataset.repeat()
print(dataset)
dataset = dataset.batch(batch_size)
print(dataset)
dataset = dataset.prefetch(batch_size*10)
print(dataset)

<TensorSliceDataset shapes: ((4,), ()), types: (tf.float64, tf.float64)> 
<CacheDataset shapes: ((4,), ()), types: (tf.float64, tf.float64)> 
<ShuffleDataset shapes: ((4,), ()), types: (tf.float64, tf.float64)> 
<RepeatDataset shapes: ((4,), ()), types: (tf.float64, tf.float64)> 
<BatchDataset shapes: ((?, 4), (?,)), types: (tf.float64, tf.float64)> 
<PrefetchDataset shapes: ((?, 4), (?,)), types: (tf.float64, tf.float64)>

Upvotes: 0

Views: 901

Answers (2)

William Heymann
William Heymann

Reputation: 76

You are looking at the wrong display to see GPU usage with tensorflow. What you are seeing is the 3D activity of the video card.

If you notice there is a drop down arrow next to 3D, Video Encode etc. Set one of them to Cuda and the other to Copy. This allows you to see the compute usage and copying time.

I actually have a similar type of problem I am working on where I get about 65% usage under Cuda because the dataset is so small. You can increase the batch size to increase GPU usage but you also hurt the net as a result so it really is better to train on data sets with a batch size around 32-128 for most things even if your GPU memory will work on far more.

The answer above for using Datasets should work if you can figure out how to get it working right. That is something I am working on now.

Upvotes: 2

Sharky
Sharky

Reputation: 4533

You can increase GPU utilization by increasing batch size. However, considering rather small dataset size, performance can still be improved by using Dataset API. It's much more scalable solution, capable of handling large datasets.

dataset = tf.data.Dataset.from_tensor_slices((X_train_scaled, y_train))
dataset = dataset.cache() #caches dataset in memory
dataset = dataset.shuffle(len(X_train_scaled)) #shuffles dataset
dataset = dataset.repeat() #with no parameter, repeats indefinitely
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(batch_size*10) #prefetches data 

Then you just pass dataset object to model.fit with no batch_size, cause it was specified earlier and with steps_per_epoch to let the model know the size of epoch.

his = model.fit(dataset, steps_per_epoch=7500, epochs=1000)

p.s. With csv file of this size it's hard to get high utilization rate. You can easily pass whole dataset as one batch and get about 60%. More info here https://www.tensorflow.org/guide/performance/datasets

Upvotes: 2

Related Questions