kuniqs
kuniqs

Reputation: 308

Game playing with itself

I'm going to write a roguelike borg - an ai that will play and possibly win Rogue. My approach looks like this: - Decisions are made with a state machine, so the actions borg takes are sorta predictable and can be checked during runtime. - State inputs are fed through neural nets. Changing nets is the main way borg learns. - Nets are changed when ai takes a bad enough action. Each action's immediate effects get assigned a score, 1 means purely good (like healing outside combat), -1 purely bad (dying). Score = -1 at the start, so net will only change it's behavior after deaths for first n iterations. - 'Teaching the net' means negative reinforcement learning - borg is taught to not do this thing; to increase the likehood of doing anything else next time in that situation. - Borg predicts the future by simulating it's own actions N steps forward, predicting outputs and training his own predictive net when it makes big enough errors.

1) How to do deductive reasoning? To do thing C, we can do thing B. To do thing B, we can do thing A. Therefore, to do C we can do A. We cannot directly do B. How do I make a computer figure this out? 
For a 'real' world example, to reliably kill an Ice Beast in Rogue, borg can cast Fire Bolt, which it can learn from a spellbook. So, to kill Ice Beast borg has to find the book (or a wand of firebolts, or..). 

My idea is to represent each 'action' that happens in the borg world as a neural net, so casting a fire spell and using a fire wand seem similar to it. Borg remembers each distinct action it took (let's assume we have indefinite resources). Each action borg wants to accomplish has a 'key' attached to it, which is a net trained to give perfect score to a perfect input (fire for Ice etc.). Next, borg chooses the inputs of actions it took in the past that are at least X% similar to inputs of perfect action. Then, borg feeds those inputs and chooses the action with the best score. This algorithm goes recurrently 'till it evaluates all actions. The action-chain with best scores overall is assumed to be A->B->C chain mentioned above. What's wrong with this picture?

2) How to do long term memory about things that happen and patterns? Like, borg caught itself in a bad situation, so it wants to remember the circumstances that led to it. 
My guess is to represent each notable situation as inputs for hopfield net, and each step the borg feds the current world state to every net he has. This has the obvious problem that the nets can't grow into infinity. Can you see a better way?

Upvotes: 0

Views: 223

Answers (1)

Houshalter
Houshalter

Reputation: 2788

General game playing is immensely difficult area of AI and your approach is likely to suffer from combinatorial explosion.

There has recently been some success with teaching neural networks to play games with reinforcement learning and temporal difference learning. Basically the NN is trained to predict the future "reward" of every possible action, and then takes the action with the highest predicted reward.

But even that is unlikely to work very well on a complicated game like rogue.

Upvotes: 1

Related Questions