Reputation: 3076
When I try to use DeepSpeed example to finetune a OPT 1.3b model on my local machine, I have an unexpected error, which related to following code snippet:
template <typename T>
__global__ void moe_res_matmul(T* residual, T* coef, T* mlp_out, int seq_len, int hidden_dim)
{
constexpr int granularity = 16;
constexpr int vals_per_access = granularity / sizeof(T);
T* residual_seq = residual + blockIdx.x * hidden_dim;
T* mlp_out_seq = mlp_out + blockIdx.x * hidden_dim;
for (unsigned tid = threadIdx.x * vals_per_access; tid < hidden_dim;
tid += blockDim.x * vals_per_access) {
T mlp[vals_per_access];
T res[vals_per_access];
T coef1[vals_per_access];
T coef2[vals_per_access];
mem_access::load_global<granularity>(mlp, mlp_out_seq + tid);
mem_access::load_global<granularity>(res, residual_seq + tid);
mem_access::load_global<granularity>(coef1, coef + tid);
mem_access::load_global<granularity>(coef2, coef + tid + hidden_dim);
#pragma unroll
for (int idx = 0; idx < vals_per_access; idx++) {
mlp[idx] = mlp[idx] * coef2[idx] + res[idx] * coef1[idx];
}
mem_access::store_global<granularity>(mlp_out_seq + tid, mlp);
}
}
The error messages in the log is like this:
/.../python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(529):
error: no operator "*" matches these operands
operand types are: __half * __half
mlp[idx] = mlp[idx] * coef2[idx] + res[idx] * coef1[idx];
^
detected during:
instantiation of "void moe_res_matmul(T *, T *, T *, int, int) [with T=__half]"
at line 547
instantiation of "void launch_moe_res_matmul(T *, T *, T *, int, int, cudaStream_t) [with T=__half]"
at line 566
The finetuning environment:
datasets>=2.8.0
sentencepiece>=0.1.97
protobuf==3.20.3
accelerate>=0.15.0
torch>=1.12.0
deepspeed>=0.9.0
Any idea how to solve this issue? Thanks!
Upvotes: 1
Views: 65