Reputation: 11
If we receive Send or Send with Immediate packets but don't have local write access right in RC, UC, or UD, what behavior should we expect?
I think that there’s a local protection error in the target, and in the initiator will encounter a failure
Upvotes: 0
Views: 75
Reputation: 127
First of all, we need to distinguish QP types from memory access flags.
RC, UC, UD are QP types that included in ibv_qp_init_attr
which can be specified before we create QP. The following is structure of ibv_qp_init_attr
.
struct ibv_qp_init_attr {
void *qp_context; /* Associated context of the QP */
struct ibv_cq *send_cq; /* CQ to be associated with the Send Queue (SQ) */
struct ibv_cq *recv_cq; /* CQ to be associated with the Receive Queue (RQ) */
struct ibv_srq *srq; /* SRQ handle if QP is to be associated with an SRQ, otherwise NULL */
struct ibv_qp_cap cap; /* QP capabilities */
enum ibv_qp_type qp_type; /* QP Transport Service Type: IBV_QPT_RC, IBV_QPT_UC, IBV_QPT_UD, IBV_QPT_RAW_PACKET or IBV_QPT_DRIVER */
int sq_sig_all; /* If set, each Work Request (WR) submitted to the SQ generates a completion entry */
};
struct ibv_qp_cap {
uint32_t max_send_wr; /* Requested max number of outstanding WRs in the SQ */
uint32_t max_recv_wr; /* Requested max number of outstanding WRs in the RQ */
uint32_t max_send_sge; /* Requested max number of scatter/gather (s/g) elements in a WR in the SQ */
uint32_t max_recv_sge; /* Requested max number of s/g elements in a WR in the RQ */
uint32_t max_inline_data;/* Requested max number of data (bytes) that can be posted inline to the SQ, otherwise 0 */
};
But memory access flags are the permissions that we want to give when we register a piece of memory through the RDMA card. It can be specified by setting the access
parameter in function ibv_reg_mr
.
struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access);
The argument access
describes the desired memory protection attributes; it is either 0 or the bitwise OR of one or more of the following flags:
IBV_ACCESS_LOCAL_WRITE Enable Local Write Access
IBV_ACCESS_REMOTE_WRITE Enable Remote Write Access
IBV_ACCESS_REMOTE_READ Enable Remote Read Access
IBV_ACCESS_REMOTE_ATOMIC Enable Remote Atomic Operation Access (if supported)
IBV_ACCESS_MW_BIND Enable Memory Window Binding
IBV_ACCESS_ZERO_BASED Use byte offset from beginning of MR to access this MR, instead of a pointer address
IBV_ACCESS_ON_DEMAND Create an on-demand paging MR
IBV_ACCESS_HUGETLB Huge pages are guaranteed to be used for this MR, applicable with IBV_ACCESS_ON_DEMAND in explicit mode only
IBV_ACCESS_RELAXED_ORDERING Allow system to reorder accesses to the MR to improve performance
If IBV_ACCESS_REMOTE_WRITE or IBV_ACCESS_REMOTE_ATOMIC is set, then IBV_ACCESS_LOCAL_WRITE must be set too.
Local read access is always enabled for the MR.
... excerpt from `man ibv_reg_mr`
So, "don't have local write access right" means that we register a piece of memory and give it a access equal 0
. We can check the completion status and vendor error by checking the work completion ibv_wc
.
struct ibv_wc {
uint64_t wr_id; /* ID of the completed Work Request (WR) */
enum ibv_wc_status status; /* Status of the operation */
enum ibv_wc_opcode opcode; /* Operation type specified in the completed WR */
uint32_t vendor_err; /* Vendor error syndrome */
uint32_t byte_len; /* Number of bytes transferred */
union {
__be32 imm_data; /* Immediate data (in network byte order) */
uint32_t invalidated_rkey; /* Local RKey that was invalidated */
};
uint32_t qp_num; /* Local QP number of completed WR */
uint32_t src_qp; /* Source QP number (remote QP number) of completed WR (valid only for UD QPs) */
unsigned int wc_flags; /* Flags of the completed WR */
uint16_t pkey_index; /* P_Key index (valid only for GSI QPs) */
uint16_t slid; /* Source LID */
uint8_t sl; /* Service Level */
uint8_t dlid_path_bits; /* DLID path bits (not applicable for multicast messages) */
};
we can check out what happened by printing out ibv_wr.status
and ibv_mr.vendor_err
. check this post (https://www.rdmamojo.com/2013/02/15/ibv_poll_cq/) for more information.
A simple test(QP type = RC)
server (normal)
client: registered memory with access
equal 0
The result is:
server vendor_err = 137, operation status = 11, it means IBV_WC_REM_OP_ERR
— Remote Operation Error: the operation could not be completed successfully by the responder. Possible causes include a responder QP related error that prevented the responder from completing the request or a malformed WQE on the Receive Queue. Relevant for RC QPs
client vendor_err = 51, operation status = 4, it means IBV_WC_LOC_PROT_ERR
- Local Protection Error: the locally posted Work Request’s buffers in the scatter/gather list does not reference a Memory Region that is valid for the requested operation.
Upvotes: 0