Rust how to interact with a window

Question

Introduction

Currently, I'm working for a customer who wants to automatize some actions inside their Accounting application.

Problem

I searched a crate for this but I didn't find anything, to read the screen of another window, and post some message like key interaction or click interaction.

Question

Does someone know a crate for interacting with another window in Rust? I need for the interaction: window screen reading, post some key message, and post some click message to this window.

frankenapps · Accepted Answer

There are various crates that let you simulate user input (e.g. mouse and keyboard input) even in a cross-platform fashion:

enigo
autogui

and crates for taking screenshots like

screenshots

Apart from that there also is autopilot which lets you do both.

Here is an example for capturing the main windows' screen using autopilot and image (for actually storing the image):

use image::{GenericImageView, png::PNGEncoder};

fn main() {
    let bitmap = autopilot::bitmap::capture_screen().expect("Failed to capture main screen.");

    let mut buf = Vec::new();
    let encoder = PNGEncoder::new(
        &mut buf
    );
    encoder
        .encode(
            &bitmap.image.as_rgb8().unwrap(),
            bitmap.image.width(),
            bitmap.image.height(),
            image::ColorType::RGB(8),
        )
        .expect("Failed to encode png.");

    std::fs::write("test.png", buf).expect("Failed to write screenshot to disk.");
}

and here is an example for mouse input (move cursor in a circle):

const MARGIN: f64 = 10.0;
const MILLIS: u64 = 10;

struct Center {
    x: f64,
    y: f64,
}

fn main() {
    circle_mouse().expect("Unable to move mouse");
}

fn circle_mouse() -> Result<(), autopilot::mouse::MouseError> {
    let screen_size = autopilot::screen::size();
    let scoped_height = screen_size.height / 2.0 - MARGIN;
    let scoped_width = screen_size.width / 2.0 - MARGIN;
    let scoped_radius;

    if scoped_height > scoped_width {
        scoped_radius = scoped_width;
    }
    else {
        scoped_radius = scoped_height;
    }

    let center = Center { x: scoped_width, y: scoped_height };

    for i in 0..360 {
        let x = (i as f64 / 180.0 * std::f64::consts::PI).cos() * scoped_radius;
        let y = (i as f64 / 180.0 * std::f64::consts::PI).sin() * scoped_radius;
        autopilot::mouse::move_to(autopilot::geometry::Point::new(
            center.x + x as f64,
            center.y + y as f64,
        ))?;
        std::thread::sleep(std::time::Duration::from_millis(MILLIS));
    }

    Ok(())
}

If you only want to capture the area of the window, you can do so by taking a screenshot of the full desktop and then cropping it to the window only. On windows you can get the window rect of a certain window using GetWindowRect. Here is a snippet for getting the window rect using its ID or name.

Update due to request in comment

Here is a sample for how to only capture a specifc portion of the screen that contains a given window (works only on windows and the window must be fully visible on screen):

use autopilot::geometry::{Point, Rect, Size};
use std::{ffi::OsString, iter::once, os::windows::prelude::OsStrExt, ptr::null};
use windows_sys::Win32::{
    Foundation::{HWND, RECT},
    UI::WindowsAndMessaging::{FindWindowW, GetWindowRect},
};

fn main() {
    // The title of the window (is shown when hovering over the window in the taskbar):
    let window_name = OsString::from("autopilot");
    let window_name: Vec = window_name
        .as_os_str()
        .encode_wide()
        .chain(once(0))
        .collect();
    let id: HWND = unsafe { FindWindowW(null(), window_name.as_ptr()) };
    let mut rect = RECT {
        left: 0,
        top: 0,
        right: 0,
        bottom: 0,
    };
    if id != 0 && unsafe { GetWindowRect(id, &mut rect) } != 0 {
        /* println!(
            "HWND: {}
Location: {} {}
Size: {} {}",
            id,
            rect.left,
            rect.top,
            rect.right - rect.left,
            rect.bottom - rect.top
        ); */

        let bitmap = autopilot::bitmap::capture_screen_portion(Rect::new(
            Point::new(rect.left as f64, rect.top as f64),
            Size::new(
                (rect.right - rect.left) as f64,
                (rect.bottom - rect.top) as f64,
            ),
        ))
        .expect("Failed to capture screen portion.");
        bitmap
            .image
            .save("screen_portion.png")
            .expect("Failed to write image to disk.");
    }
}

This sample has the following dependencies:

autopilot = "0.4.0"
windows-sys = { version = "0.36.1", features = ["Win32_Foundation", "Win32_UI_WindowsAndMessaging"] }

Note that this does include the invisible window borders which you probably do not want. You can use DwmGetWindowAttribute to correct for the visual offset like this:

use autopilot::geometry::{Point, Rect, Size};
use std::{
    ffi::{c_void, OsString},
    iter::once,
    mem::size_of,
    os::windows::prelude::OsStrExt,
    ptr::null,
};
use windows_sys::Win32::{
    Foundation::{HWND, RECT},
    Graphics::Dwm::{DwmGetWindowAttribute, DWMWA_EXTENDED_FRAME_BOUNDS},
    UI::WindowsAndMessaging::{FindWindowW, GetWindowRect},
};

fn main() {
    // The title of the window (is shown when hovering over the window in the taskbar):
    let window_name = OsString::from("autopilot");
    let window_name: Vec = window_name
        .as_os_str()
        .encode_wide()
        .chain(once(0))
        .collect();
    let id: HWND = unsafe { FindWindowW(null(), window_name.as_ptr()) };
    let mut rect = RECT {
        left: 0,
        top: 0,
        right: 0,
        bottom: 0,
    };
    if id != 0 && unsafe { GetWindowRect(id, &mut rect) } != 0 {
        /* println!(
            "Window:
HWND: {}
Location: {} {}
Size: {} {}",
            id,
            rect.left,
            rect.top,
            rect.right - rect.left,
            rect.bottom - rect.top
        ); */

        let frame = Box::new(RECT {
            left: 0,
            top: 0,
            right: 0,
            bottom: 0,
        });
        let frame_ptr = Box::into_raw(frame);
        let _res = unsafe {
            DwmGetWindowAttribute(
                id,
                DWMWA_EXTENDED_FRAME_BOUNDS,
                frame_ptr as *mut c_void,
                size_of::() as u32,
            )
        };

        let frame = unsafe { Box::from_raw(frame_ptr) };

        let border = RECT {
            left: frame.left - rect.left,
            top: frame.top - rect.top,
            right: rect.right - frame.right,
            bottom: rect.bottom - frame.bottom,
        };

        let adjusted_rect = RECT {
            left: rect.left + border.left,
            top: rect.top + border.top,
            right: rect.right - border.right,
            bottom: rect.bottom - border.bottom,
        };

        // Window must be fully on screen for capture.
        if rect.left >= 0 && rect.top >= 0 {
            let bitmap = autopilot::bitmap::capture_screen_portion(Rect::new(
                Point::new(adjusted_rect.left as f64, adjusted_rect.top as f64),
                Size::new(
                    (adjusted_rect.right - adjusted_rect.left) as f64,
                    (adjusted_rect.bottom - adjusted_rect.top) as f64,
                ),
            ))
            .expect("Failed to capture screen portion.");
            bitmap
                .image
                .save("screen_portion.png")
                .expect("Failed to write image to disk.");
        }
    }
}

using those dependencies

autopilot = "0.4.0"
windows-sys = { version = "0.36.1", features = ["Win32_Foundation", "Win32_Graphics_Dwm", "Win32_UI_WindowsAndMessaging"] }

Update due to another comment

Yes you can use the PostMessageW function from the WinAPI in rust, too. Here is a simple sample, that contains the basic idea of the linked sample:

use std::{ffi::OsString, iter::once, os::windows::prelude::OsStrExt, ptr::null};
use windows_sys::Win32::{
    Foundation::HWND,
    UI::{
        Input::KeyboardAndMouse::VK_LEFT,
        WindowsAndMessaging::{FindWindowW, PostMessageW},
    },
};

const KEY_DOWN: u32 = 256;
const KEY_UP: u32 = 257;

fn main() {
    // The title of the window (is shown when hovering over the window in the taskbar):
    let window_name = OsString::from("autopilot");
    let window_name: Vec = window_name
        .as_os_str()
        .encode_wide()
        .chain(once(0))
        .collect();
    let id: HWND = unsafe { FindWindowW(null(), window_name.as_ptr()) };
    
    let res = unsafe { PostMessageW(id, KEY_DOWN, VK_LEFT.into(), 0) };
    if res == 0 {
        panic!("Failed to post message to window.");
    }

    std::thread::sleep(std::time::Duration::from_millis(500));
    let res = unsafe { PostMessageW(id, KEY_UP, VK_LEFT.into(), 0) };
    if res == 0 {
        panic!("Failed to post message to window.");
    }
}

it depends on

windows-sys = { version = "0.36.1", features = ["Win32_Foundation", "Win32_UI_Input_KeyboardAndMouse", "Win32_UI_WindowsAndMessaging"] }

If you want to detect certain UI elements on the screen and get their position, you would probably need to implement this yourself using pattern matching / computer vision using something like opencv and using the screenshot taken beforehand as input.