OpenGL: screen-to-world transformation and good use of glm::unProject

Question

I have what I believed to be a basic need: from "2D position of the mouse on the screen", I need to get "the closest 3D point in the 3D world". Looks like ray-tracing common problematic (even if it's not mine).

I googled / read a lot: looks like the topic is messy and lots of things gets unfortunately quickly intricated. My initial problem / need involves lots of 3D points what I do not know (meshes or point cloud from the internet), so, it's impossible to understand what result you should expect! Thus, I decided to create simple shapes (triangle, quadrangle, cube) with points that I know (each coord of each point is 0.f or 0.5f in local frame), and, try to see if I can "recover" 3D point positions from the mouse cursor when I move it on the screen.

Note: all coord of all points of all shapes are known values like 0.f or 0.5f. For example, with the triangle:

float vertices[] = {
    -0.5f, -0.5f, 0.0f,
     0.5f, -0.5f, 0.0f,
     0.0f,  0.5f, 0.0f
};

What I do

I have a 3D OpenGL renderer where I added a GUI to have controls on the rendered scene

Transformations: tx, ty, tz, rx, ry, rz are controls that enables to change the model matrix. In code

// create transformations: model represents local to world transformation
model = glm::mat4(1.0f); // initialize matrix to identity matrix first
model = glm::translate(model, glm::vec3(tx, ty, tz));
model = glm::rotate(model, glm::radians(rx), glm::vec3(1.0f, 0.0f, 0.0f));
model = glm::rotate(model, glm::radians(ry), glm::vec3(0.0f, 1.0f, 0.0f));
model = glm::rotate(model, glm::radians(rz), glm::vec3(0.0f, 0.0f, 1.0f));
ourShader.setMat4("model", model);

model changes only the position of the shape in the world and has no connection with the position of the camera (that's what I understand from tutorials).

Camera: from here, I ended-up with a camera class that holds view and proj matrices. In code

// get view and projection from camera
view = cam.getViewMatrix();
ourShader.setMat4("view", view);
proj = cam.getProjMatrix((float)SCR_WIDTH, (float)SCR_HEIGHT, near, 100.f);
ourShader.setMat4("proj", proj);

The camera is a fly-like camera that can be moved when moving the mouse or using keyboard arrows and that does not act on model, but only on view and proj (that's what I understand from tutorials).

The shader then uses model, view and proj this way:

uniform mat4 model;
uniform mat4 view;
uniform mat4 proj;
void main()
{
   // note that we read the multiplication from right to left
   gl_Position = proj * view * model * vec4(aPos.x, aPos.y, aPos.z, 1.0);

Screen to world: as using glm::unProject didn't always returned results I expected, I added a control to not use it (back-projecting by-hand). In code, first I get the cursor mouse position frame3DPos following this

// glfw: whenever the mouse moves, this callback is called
// -------------------------------------------------------
void mouseCursorCallback(GLFWwindow* window, double xposIn, double yposIn)
{
    // screen to world transformation

    xposScreen = xposIn;
    yposScreen = yposIn;

    int windowWidth = 0, windowHeight = 0; // size in screen coordinates.
    glfwGetWindowSize(window, &windowWidth, &windowHeight);
    int frameWidth = 0, frameHeight = 0; // size in pixel.
    glfwGetFramebufferSize(window, &frameWidth, &frameHeight);
    glm::vec2 frameWinRatio = glm::vec2(frameWidth, frameHeight) /
                              glm::vec2(windowWidth, windowHeight);
    glm::vec2 screen2DPos = glm::vec2(xposScreen, yposScreen);
    glm::vec2 frame2DPos = screen2DPos * frameWinRatio; // window / frame sizes may be different.
    frame2DPos = frame2DPos + glm::vec2(0.5f, 0.5f); // shift to GL's center convention.
    glm::vec3 frame3DPos = glm::vec3(0.0f, 0.0f, 0.0f);
    frame3DPos.x = frame2DPos.x;
    frame3DPos.y = frameHeight - 1.0f - frame2DPos.y; // GL's window origin is at the bottom left
    frame3DPos.z = 0.f;
    glReadPixels((GLint) frame3DPos.x, (GLint) frame3DPos.y, // CAUTION: cast to GLint.
                 1, 1, GL_DEPTH_COMPONENT,
                 GL_FLOAT, &zbufScreen); // CAUTION: GL_DOUBLE is NOT supported.
    frame3DPos.z = zbufScreen; // z-buffer.

And then I can call glm::unProject or not (back-projecting by-hand) according to controls in GUI

glm::vec3 world3DPos = glm::vec3(0.0f, 0.0f, 0.0f);
if (screen2WorldUsingGLM) {
    glm::vec4 viewport(0.0f, 0.0f, (float) frameWidth, (float) frameHeight);
    world3DPos = glm::unProject(frame3DPos, view * model, proj, viewport);
} else {
    glm::mat4 trans = proj * view * model;
    glm::vec4 frame4DPos(frame3DPos, 1.f);
    frame4DPos = glm::inverse(trans) * frame4DPos;
    world3DPos.x = frame4DPos.x / frame4DPos.w;
    world3DPos.y = frame4DPos.y / frame4DPos.w;
    world3DPos.z = frame4DPos.z / frame4DPos.w;
}

Question: glm::unProject doc says Map the specified window coordinates (win.x, win.y, win.z) into object coordinates, but, I am not sure to understand what are object coordinates. Does object coordinates refers to local, world, view or clip space described here?

Z-buffering is always allowed whatever the shape is 2D (triangle, quadrangle) or 3D (cube). In code

glEnable(GL_DEPTH_TEST); // Enable z-buffer.
while (!glfwWindowShouldClose(window)) {
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // also clear the z-buffer

In picture I get

The camera is positioned at (0., 0., 0.) and looks "ahead" (front = -z as z-axis is positive from screen to me). The shape is positioned (using tx, ty, tz, rx, ry, rz) "in front of the camera" with tz = -5 (5 units following the front vector of the camera)

What I get

Triangle in initial setting

I have correct xpos and ypos in world frame but incorrect zpos = 0. (z-buffering is allowed). I expected zpos = -5 (as tz = -5).

Question: why zpos is incorrect?

If I do not use glm::unProject, I get outer space results

Question: why "back-projecting" by-hand doesn't return consistent results compared to glm::unProject? Is this logical? Arethey different operations? (I believed they should be equivalent but they are obviously not)

Triangle moved with translation

After translation of about tx = 0.5 I still get same coordinates (local frame) where I expected to have previous coord translated along x-axis. Not using glm::unProject returns oute-space results here too...

Question: why translation (applied by model - not view nor proj) is ignored?

Cube in initial setting

I get correct xpos, ypos and zpos?!... So why is this not working the same way with the "2D" triangle (which is "3D" one to me, so, they should behave the same)?

Cube moved with translation

Translated along ty this time seems to have no effect (still get same coordinates - local frame).

Question: like with triangle, why translation is ignored?

What I'd like to get

The main question is why the model transformation is ignored? If this is to be expected, I'd like to understand why. If there's a way to recover the "true" position of the shape in the world (including model transformation) from the position of the mouse cursor, I'd like to understand how.