E D R S I H C RSS
ID
Password
Join
인간의 마음은 출생할 때에는 백지(白紙)와 같다. - 로크(J. Locke)

<!>
  • 원저자 : Nick Palmer
  • 원문링크 : [http]http://ai-depot.com/GameAI/Learning.html

  • 후유. 무슨 글이 이렇게 관계절이 많은지... 의역해도 매끄럽게 안되는 군요. :( 특히 서두에 마크 부분은 해석이 이상합니다.)

Contents

1 시작하며
2 게임 산업
3 학습의 다양성
4 학습하는 에이전트를 생성하기
5 학습의 문제점
6 결론
7 References

1 시작하며 #

이 아티클에서는 그 게임 산업에서의 "기계 학습"의 현재 모습과 현재 혹은 미래의 게임에 사용된 몇몇 기술과 구현을 개괄적으로 소개하고, 그다음 순수한 여러분 자신만의 '학습 에이전트'를 디자인하는 것을 시작하는 법에 대해 설명한다.

2 게임 산업 #

"기계 학습"은 게임 개발자들에 의해 일정한 관심을 받아왔다. 그렇지만 최근까지 발표된 주요 게임들에 의해서는 사용되지 않았다. 확실히 학습가능한 게임들을 위한 잠재수요가 있는 것이 틀림없지만, 각각의 적들에게 적응하는 전략을 조정가능한 이런 게임들이 왜 출시되지 않았을까? (Why is this -- surely there must be potential demand for games that can learn -- games that can adjust strategy to adapt to different opponents?) 오랫동안 보여져왔던 "열정의 부족"에는 몇가지 중요한 이유가 있다. 질문받았던 또다른 것중 하나는 단지 게임이 "학습하는 것"이 얼마나 중요한가이다. 보통의 게임 플레이어가 게임플레이내에서 중요한 발전의 가치를 인정할 것인가, 또는 모든 노력이 시간과 돈 낭비가 될 것인가? 이것은 물론 전적으로 게임에 달려있다. 만일 심지어 친절함과 잔인함까지도 주인을 흉내내기 위해서 배울려고하는 크리쳐가 없었다면 "블랙 앤 화이트"가 절반이라도 성공적이었을까?

많은 게임 회사들은 그저 적들의 능력치를 높이는 것보다는 현재 전술과 전략을 변경하는 것에 의해 플레이어의 능력과 맞출 수 있는 게임에 대한 가능성을 찾고 있다. 이것은 일반적인 "난이도 레벨"기능은 그렇게 드문 것이 아니라는 것처럼 들린다. 그러나 시장에서 플레이어의 전술들을 읽어내면서 그 전술에 맞출 수 있는 게임의 거의 없다. 대부분의 게임에서 가장 어려운 레벨 설정에서 조차도(특별히 FPS-일인칭시점슈팅게임을 들면), 대부분의 플레이어들은 (성공적으로 적용하면 더욱 자주 이길수 있는) 반복적인 패턴을 가지고 있다. 어쨌거나, 만약 인공지능이 그들이 즐겨 숨는 장소를 알아내거나 그들의 승리 전략을 읽고 그들에게 맞출 수 있다면, 플레이어들은 당연히 그렇게 독선적인 플레이는 하지 않을 것이다! 이것은 미래의 출시되는 게임들에 있어서는 매우 중요한 기능이 될 것이고, 출시된 게임의 생명을 꽤 늘려줄지도 모른다. (역자 : 정말 꿈같은 이야기이다. :o )

3 학습의 다양성 #

게임 디자이너로서 최대 유혹 중 하나는 게임이 학습한다는 느낌을 가짜로 만들어내는 것이다. 속임수 인공지능 시스템을 만들어내는 것은 게임 산업에서는 흔한 일이 되었다. 그리고 이런 시스템을 만드는 것에 있어서 거의 도덕적인 반대는 없을 거라고 생각한다(왜냐면 이것은 상당히 많은 것들을 단순화시켜주기 때문이다) "학습되었다는 느낌"(impression of learning)는 인공지능에 의해 만들어진 전술적 결정사항의 오류발생빈도를 조정하고 게임을 플레이하는 '경험치'에 따라 그 빈도를 감소시킴으로써 쉽게 구현가능하다. 이것은 지능적인 학습시스템 처리에 대한 실제와 같은 환상을 만들어내지만, 원하는 행동이 이미 알려져 있지 않으면 사용될 수 없다 - 다시 말하면, 이것은 학습이 플레이어 전략과 대치된다면 쓸모없게 되버린다.(in other words, this is useless for learning to counter player strategy.)

학습 과정의 핵심은, 성능을 향상시키기위할 목적으로 행동을 적응하는 것이다. 근본적으로 얘기하자면, 이것을 구현하는데 있어서는 직접적인 방법(changing behaviour by testing modifications to it)과 간접적인 방법(making alterations to certain aspects of behaviour based on observations)의 두가지가 있다. 각각에는 장단점들이 있지만, 직접적인 적응법은 동작을 제한하지 않는다는 장점을 가지고 있다. <미해석> (There are positive and negative sides to each, but direct adaptation does have the advantage of not limiting behaviour, which means that ultimately, a better goal may be achievable.)

4 학습하는 에이전트를 생성하기 #

적응가능한 게임을 만드는 것이 왜 바람직한지 보았으면, 지금부터는 학습 시스템이 게임에 구현될 수 있는 방법을 시연해보도록 하겠다. As a basis for this subject, I shall use the example of a team-strategy based paintball game -- which I am designing the AI for as part of the final year of my degree. The aim of the program is for a team of seven agents to capture the opponent's flag and bring it back to their starting position. They must do this without being hit by the opposing team's paintballs. So, what elements are involved in the tactics behind this type of game? Well for a start, I shall exclude the team strategy from my discussions, and concentrate on the individual agents -- it is no good having a perfect strategy if the agents aren't smart enough to survive on their own!

We must consider the factors which will influence an agents performance in the game. Terrain is an obvious start point, as this is all that stands between the two teams, so it must be used to the agent's advantage. Secondly, there must be an element of stealth behind the behaviour of each agent, as otherwise it will be simple to undermine any tactics used during the game by simply picking off the naive agents one-by-one.

A learning agent is composed of a few fundamental parts : a learning element, a performance element, a curiosity element (or 'problem generator'), and a performance analyser ( or 'critic'). The learning element is the part of the agent which modifies the agent's behaviour and creates improvements. The performance element is responsible for choosing external actions based on the percepts it has received (percepts being information that is known by the agent about its environment). To illustrate this, consider that one of our agents is in the woods playing paintball. He is aware of an opposing paintballer nearby. This would be the percept that the agent responds to, by selecting an action - moving behind a tree. This choice of action is made by the performance element.

The performance analyser judges the performance of the agent against some suitable performance measure (which in this case could be how close the agent is to being hit by the enemy, or how many enemies have been hit). The performance must be judged on the same percepts as those received by the performance element - the state of affairs 'known' to the agent. When the analysis of performance has been made, the agent must decide whether or not a better performance could be made in the future, under the same circumstances. This decision is then passed to the learning element, which decides on the appropriate alteration to future behaviour, and modifies the performance element accordingly.

learning.png
So far, so good. But then how do we make sure that the agent advances in its learning, and doesn't merely confine itself to previously observed behaviour? (See the section on Set Behaviour, below). This is dealt with by the curiosity element (so-called because it searches for a better solution) which has a knowledge of the desirable behaviour of the agent (i.e. it knows that being shot is not desirable, and that finding the opponent's flag is!). To achieve optimal performance, this element will pose new challenges to the agent in an attempt to prevent (bad) habits developing. To understand the benefits of this, consider a paintballer who is hiding behind a tree. From his past experience, he knows that he is safe to stay where he is, and this would result in an adequate performance. However, the curiosity element kicks in, and suggests that he makes a break from his cover and heads to a nearby tree which is closer to the enemy flag. This may result in the agent ultimately being shot at, but could also achieve a more desirable goal. It is then up to the performance analyser and the learning element to consider whether there is a benefit to this change in strategy.

At this point, it would be a good idea to mention the fact that this style of learning is known as reinforcement learning, which means that agent can see the result of its actions, but is not told directly what it should have done instead. This means that the agent must use, what is really trial and error, to evaluate its performance and learn from mistakes. The advantage to this is that there is no limitation on the behaviour, other than the limit to alterations suggested through the curiosity element. If after each action, the agent was told what its mistake was and how it should correct its behaviour, then the desired behaviour must already be understood, and therefore the learning is, in effect, obsolete.

As the learning agent is ultimately part of a game, it must not be left simply to work out for itself how to play. The agents must be imparted with a fair degree of prior knowledge about the way to behave. In the case of paintball, this could include methods for avoiding fire by using cover, which may later be adapted during the learning process. Many games developers use learning algorithms in their games to create better computer player AI, but the resulting AI is then 'frozen' before shipping.

I shall not go into the many algorithms which can be used to implement learning here, but they are well documented elsewhere, and can be found on the web. Decision trees are widely believed to be a good method of 'reasoning' - as are belief networks and neural networks, but these are beyond the scope of this article.

5 학습의 문제점 #

Despite the obvious potential that learning has to offer the gaming world, it must be used carefully to avoid certain pitfalls. Here are but a few of the problems commonly encountered when constructing a Learning AI:

  • Mimicking Stupidity - When teaching an AI by copying a human player's strategy, you may find that the computer is taught badly. This is more than likely when the player is unfamiliar with a game. In this situation, a reset function may be required to bring the AI player back to its initial state, or else a minimum level must be imposed on the computer player to prevent its performance dropping below a predetermined standard.

  • Overfitting - This can occur if an AI agent is taught a certain section of a game, and then expected to display intelligent behaviour based on its experience. Using a FPS as an example, an agent which has learnt from its experience over one level will encounter problems when attempting a new level, as it may not have learnt the correct 'lessons' from its performance. If it has found that when opening doors, it has been able to escape the line of fire by diving behind a wall to its left, it will assume that this is a generalized tactic. As you can imagine, this could lead to amusing behavioural defects if not monitored in the correct way...

  • Local Optimality - When choosing a parameter on which the agent is to base its learning, be sure to choose one which has no dependency on earlier actions. As an example, take a snow-boarding game. The agent learns, through the use of an optimization algorithm, the best course to take down the ski slope, using its rotation as a parameter. This may mean that a non-optimal solution is reached, in which any small change cannot improve performance. Think about the data being stored - a sequence of rotations clockwise and anticlockwise. An alteration to a rotation in the first half of the run may lead to a better time over the course in the long run, but in the short-run, could cause a horrific crash further down the slope, as the rest of the rotations are now slightly off course!

  • Set Behaviour - Once an agent has a record of its past behaviour and the resulting performance analysis, does it stick to the behaviour which has been successful in the past, or does it try new methods in an attempt to improve? This is a problem which must be addressed or else an agent may either try to evaluate every possible behaviour, or else stick to one without finding the optimal solution.

6 결론 #

Having looked at possible applications for learning, and seen some of the problems associated with it, it seems that there is great potential for learning in games, but it must be used with caution. The majority of games have little use for any kind of learning techniques - except in the development and testing stages. Despite the price of developing this kind of software, it looks as if learning will have a large part to play in the next generation of games.

7 References #


Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2010-10-28 12:42:52
Processing time 0.1334 sec