Reputation: 435
I will outline the specific characteristics of my request and then elaborate:
A good example is online poker.
P1---------P2---------P3
c1 c2 c3 c4 c5 c6
| |
| s1 s2 s3 s4 s5 |
| |
c7 c8 c9 c10 c11 c12
P4---------P5---------P6
Rules:
Description:
The example sums up the desired characteristics well as I may not properly articulate them.
Can machine learning be applied to this problem? Is this a particularly difficult task I am discussing? Any other advice?
Upvotes: 1
Views: 1824
Reputation: 4236
No, it is not difficult, just a bit demanding. Accessibility APIs and/or automation libraries are better suited for the task you wish to achieve than full blown machine learned detection. Each modern OS provides at least one API to get to the GUI elements of the screen. You can even dig around in OS's DLLs/SOs/dynlibs that are used in GUI generation (e.g. user32.dll on Windows) and snatch out info about the generated GUI elements directly.
Where image objects are concerned, just use fingerprinting/checksum to identify which one is on the given position or in the GUI element.
Accessibility APIs are always available. They allow you to access elements on the screen or Window of a particular app. Automation tools usually use them along with some direct access to OS. If a library in which GUI of the app doesn't use native OS GUI elements a lot (like Java SWING), then that language/library usually provides the accessibility API (e.g. Java Access Bridge). Even if images are in use, you can insulate them with the accessibility library or automation tool, then use its checksum to see which one it is (if you know which ones can appear, of course).
Only situation where they won't give you some result is when whole GUI is drawn, like when SDL is used. Then you can do OCR and blob detection of a screen shot to grab text and separate objects on the screen. Then proceed using checksum detection.
If you want/have to be really rude, you can connect to the screen reader like Voice Over on Mac or NVDA on Windows and query them for info you want.
If you need to automatise an internet game written in Flash, then you will probably have to use accessibility API for the browser you wish to use (Firefox does provide one) and even need to write some code in Flash that can grab the info for you. Or simply go for screen shot and some image processing and checksum detection. If its in Javascript, the thing will be a lot easier. You will probably be able to identify cards just using their DOM IDs. Of course, if the game is not drawn on the screen.
Machine learning is very useful, but I avoid it where there is much simpler solution to the problem. To train a neural network just to make it play a game is simply too much work for, sorry, but for nothing really. Even then, you would simplify the work by using some API to get individual images instead of pushing everything to the NN or SVM as is.
My friends developed and trained a NN for recognizing hand-written digits. It took PC 1 month to self learn i.e. train the NN to do it, and even then it made some mistakes. Well, it took so long because they worked on psychology project, so NN needed to work as human brain like as possible. But you get the idea.
Upvotes: 1