E D R S I H C RSS
ID
Password
Join
인생이란 용기에 따라서 펴질 수도 있고 움츠러들 수도 있다. ―어네이스 닌(프랑스 태생 美 작가, 1903∼1977)



Contents

1 Part I: An Introduction
1.1 Introduction
1.2 Basic Format of this Series
1.3 An Overview
1.4 Starting from Familiar Ground
1.5 Conclusion
2 Part II: Data Manipulation
2.1 Continuing On
2.2 The Benefit of Data
2.3 Another Form of Data
2.4 A Helpful Tool
2.5 Manipulating the Data
2.6 Another Little Test
2.7 Reflecting
2.8 Conclusion

1 Part I: An Introduction #

1.1 Introduction #

This series is intended to give the reader the information necessary to create a scripting system of his/her own from the ground up. The reasons one would choose to create such a system from scratch are many, most of which are analogous to reasons one would create anything else from scratch, such as a 3D engine. Most importantly, in my opinion, is that it's a valuable learning experience. After all, who doesn't want to learn? Certainly nobody who is taking the time to read this article!

Many of the articles devoted to scripting that I've seen in the past do not do enough to cater to the practical-minded programmers. These are the programmers who wish to learn about how the entire process of bringing a script from a high-level language down to some procedural format relates to their own programming efforts. They want to design a system that suits their needs, without being clouded by complexity. As such, this series will be geared towards enabling a programmer to fashion his/her own system, and not be dependent on handout code. Hopefully this series will be helpful to those who have found these other articles lacking as described. (No offense intended to anyone who may have written an article on scripting. Please don't take this personally.)

In addition, this series will include example code snippets written in C++. It's recommended that you be at least familiar with the basics of C++ classes.

1.2 Basic Format of this Series #


The format of this series will be somewhat reversed with respect to the seemingly "normal" approach. I will not begin with the high-level language and end with the low-level implementation. Rather, I will be using a bottom-up approach, as it is more natural to develop a scripting system in this manner. A significant advantage of this approach is that the code immediately produces results, allowing problems to be found much more easily, and before they become serious. This is in direct contrast to tutorials which would begin with language theory, and ask that the programmer maintain faith that eventually somewhere down the line, everything will work itself out, and be free of bugs.

This first article will provide a simple overview to gain some perspective on what the purpose of a scripting system is, the problems it is usually intended to solve, and possible implementations. A simple example will be provided, and built upon in future articles.

A rough outline of future articles is as follows:

  • First, more fully described low-level characteristics of the form of implementation that I will be writing about.
  • They will then move on to mechanisms for "embedding" this system into an already existing game or application system.
  • The topics involving language theory, parsing, and compiling will possibly conclude the series.

This outline is considered rough to leave options open for new things, depending mostly on feedback to this first article. So please, let me know what you think.

1.3 An Overview #

Most useful programs, not just games, are not completely isolated systems; without some form of input, a program's capabilities are normally quite static (limited). Think of the difference between a "Hello World!" program, and a program that asks for your name, certain personal traits (which it may then process in some manner), and then spits out some kind of analysis. You could not achieve the same effect without taking some form of input unless you went to some ridiculous effort, such as creating a different program to suit each user's needs.

This would be insane.

This is why most applications are designed as a structure of rules and pipelines through which information flows and is processed, from the input data to the resulting output. It is akin to a machine. This is why the term "engine" is thrown around so often.

For many purposes, this is enough to obtain the functionality you desire from your application or game. But what happens if you want to be able to modify the rules? From a development standpoint, modifying rules which are hard-coded into an application can be very annoying, and in some cases bug-inviting. The annoyance can come in many forms, not the least of which is the need to recompile all components dependent on the source of the changes. This is where scripting comes in.

The main purpose of scripting from a development standpoint is to provide a way to make your application's "rules structure" as dynamic as possible. Game-dependent logic and data, therefore, become prime candidates for scripting.

However, a script has to run on top of code itself. There is additional processing cost for every procedure executed in a script, on top of the script itself. Because of this, scripted instructions inherently run more slowly than the hard-coded kind. This would currently make multimedia components better candidates for remaining hard-coded, although scripts can still be appropriately used to perform some kind of initialization of such components.

1.4 Starting from Familiar Ground #

This first example is designed with simplicity in mind, so as not to distract from getting the system up and running. You will want to create a console application to use this example code as provided. This example will be object-based, but not necessarily object-oriented; the classes can therefore easily be replaced by structures for those dealing with a pure C mentality.

Let's say you have a very basic desire to see your computer speak on command. You may request that it talk a specified number of times for each execution of a particular script. In its simplest form, you would write such a script in an unrolled form. For example a script that talks twice, and then knows it has finished its execution:

talk
talk
end

Pretty basic for now, but it's enough to see some results and know you're on track. We will enumerate these operations:

enum opcode
{
  op_talk,
  op_end
};

We may choose to pair opcodes with data to make them more useful later on. It would be in our interest to make an abstraction now, so that we don't have to change a lot of code later on when we decide to encapsulate the pairing as an instruction:

// the basic instruction, currently just encapsulating an opcode
class Instruction
{
public:
  Instruction(opcode code) : _code(code)	{}
  opcode Code() const         { return _code; }
private:
  opcode	_code;
  //char*	_data;  // additional data, currently not used
};

Reasonably, a script is then a collection of these instructions. Because the list of instructions generally will be formed during an initialization process, it's ok to use an arrayed form for implementation, such as a vector. The arrayed form is also useful in later optimizations, and for random access:

// the basic script, currently just encapsulating an arrayed list of instructions
class Script
{
public:
  Script(const std::vector<Instruction>& instrList)
    : _instrList(instrList) {}
  const Instruction* InstrPtr() const { return &_instrList[0]; }
private:
  std::vector<Instruction>	_instrList;
};

Given a pointer to the beginning of a list of instructions, all that remains necessary is a procedure for iterating through the list and executing each instruction:

// note that _instrPtr must point to a valid list of instructions
Instruction* _instr = _instrPtr;	// set our iterator to the beginning
while (_instr)	// the end operation will set _instr to 0
{
  switch(_instr->Code())
  {
  case op_talk:
    std::cout << "I am talking." << std::endl;
    ++_instr;    // iterate
    break;
  case op_end:
    _instr = 0;  // discontinue the loop
    break;
  }
}

For the sake of convenience, you will probably want to encapsulate this functionality into its own class, and allow it to internally manage the instruction lists (as scripts). This would be the virtual machine, provided with useful management utilities for loading and selecting scripts:

// rudimentary virtual machine with methods inlined for convenience
class VirtualMachine
{
public:
  VirtualMachine()
    : _scriptPtr(0), _instrPtr(0), _instr(0), _scriptCount(0) {}
  // a very basic interface
  inline void Execute(size_t scriptId);
  size_t Load(const Script& script)   { return AddScript(script); }
private:  // useful abstractions
  // pointers used as non-modifying dynamic references
  typedef const Script*       ScriptRef;
  typedef const Instruction*  InstrRef;
private:  // utilities
  size_t AddScript(const Script& script) // add script to list and retrieve id
  {_scriptList.push_back(script); return _scriptCount++;}
  void SelectScript(size_t index)    // set current script by id
  {assert(index < _scriptCount);  // make sure the id is valid
  _scriptPtr = &_scriptList[index];
  _instrPtr = _scriptPtr->InstrPtr();}      
private:  // data members
  std::vector<Script> _scriptList;
  ScriptRef           _scriptPtr;    // current script
  InstrRef            _instrPtr;     // root instruction
  InstrRef            _instr;        // current instruction
  size_t              _scriptCount;  // track the loaded scripts
};

The virtual machine maintains a list of scripts that have been loaded as a vector. It also internally maintains a count of the number of scripts so that an offset (id) into the vector can be returned upon loading a script, allowing it to be stored. This makes it very easy to execute a pre-loaded script by simply passing that offset to the machine.

Although currently unnecessary, it also keeps track of the current script executing. This can be useful if the script contains more than just a list of instructions, as it will in a future article.

Its Execute() method uses the procedure previously described:

void VirtualMachine::Execute(size_t scriptId)
{
  SelectScript(scriptId);  // select our _instrPtr by script ID
  _instr = _instrPtr;      // set our iterator to the beginning
  while (_instr)
  {
    switch(_instr->Code())
    {
    case op_talk:
      std::cout << "I am talking." << std::endl;
      ++_instr;  // iterate
      break;
    case op_end:
      _instr = 0;  // discontinue the loop
      break;
    }
  }
}

A side note about OOP:

Using an Object Oriented approach, you could eliminate this switch statement and derive specific instruction types from a base instruction type with some kind of virtual Process() command. To add support for a new instruction, you would simply inherit from a base instruction class, and isolate its specific processing to that class. Lists of these instructions would of course have to support polymorphism; a vector of pointers to instructions, or some equivalent.

This extensible approach can be very convenient, and is worthy of some investigation. In my own toy experiment, however, it ran at roughly 1/3rd the speed of my non-OO VM system, which is a pretty significant performance hit. Later on in development, you will probably want to optimize the heck out of your VM's processing loop. The overhead introduced with the OO version, at least in my own experience, is not worth it. I encourage the curious to explore this some more, however, as I probably did not perform the best test possible. And please give me some feedback if you do! That's it for the side note.

Now, let's see how we would use these components to create and execute a script which talks twice, and then ends:

VirtualMachine vm;

// build the script
vector<Instruction> InstrList;
InstrList.push_back(Instruction(op_talk)); // talk twice
InstrList.push_back(Instruction(op_talk));
InstrList.push_back(Instruction(op_end));  // then end
Script script(InstrList);

// load the script and save the id
size_t scriptID = vm.Load(script);

// execute the script by its id
vm.Execute(scriptID);

1.5 Conclusion #

In the next article, I will probably get into some more interesting topics regarding instruction data, and some form of registered data (variables). This will lead into some simple mathematical functionality at the very least.

I didn't want to get too buried in example code this time around as the introduction was quite lengthy. Hopefully this was at least enough to be of some inspirational value until a more comprehensive second article. The main thing to keep in mind is that you want an efficient implementation, or else you'll end up with a system that drains all the processing power needed for your game. You will want algorithms that allow for optimizations later on for this purpose, but of course don't sacrifice the clarity of your code too early.

I'd like to hear about any questions, criticism, preferences, advice, mistakes i made, scolding (if deserved) you'd like to express. I'll keep an eye on the forum discussion, but you can always email me: Mglr9940@rit.edu

2.1 Continuing On #

I last left off with a very simple example of a machine capable of outputting some text. That was all it could do, and it was always the same text. If you remember, last time I spoke about the difference between programs built in this static manner, and programs that are able to handle more dynamic situations. If it were really necessary to create a different type of instruction for every type of message you wanted to output, it could end up being a nightmare.

2.2 The Benefit of Data #

The simplest remedy to this situation is to create a new style of instruction that makes use of optional data to dictate the message you would like printed. With this type of instruction, all that would be necessary to print a custom message would be to assign it the proper data. No need for hordes of specialized instruction types.

So now we will add support in our Instruction class for using additional data:

// the basic instruction
class Instruction
{
public:
    Instruction(opcode code) : _code(code), _data(0) {}
    Instruction(opcode code, const char* data, size_t dataSize)
        : _code(code), _data(new char[dataSize])
    { memcpy(_data, data, dataSize); }
    ~Instruction()  { delete[] _data; }

    opcode Code() const         { return _code; }
    const char* Data() const    { return _data; }   // read the data
private:
    opcode  _code;
    char*   _data;  // additional data
};

While creating an instruction, additional data can be paired with an opcode by using the second form of constructor. This constructor allocates memory of the correct length to store this data and then copies the source data into its own private storage. This data can be read, but will never be changed again, according to the current interface. A destructor has been added to handle deletion of the data.

If you're asking why the constructor creates a copy of the data provided when it seems simple enough just to assign the internal pointer to the address of the data provided, consider this: What would happen if the source data were to leave scope? You would be left with a dangling pointer. This is why the class owns its data buffer.

Now, we would like to add a new opcode to designate the new functionality we require:

enum opcode
{
    op_talk,
    op_print,    // our new printing code
    op_end
};

The last new inclusion to make is in the virtual machine's processing loop. In the case of our new opcode, it must print the message described by the data, and then go to the next instruction:

void VirtualMachine::Execute(size_t scriptId)
{
    SelectScript(scriptId);   // select our _instrPtr by script ID
    _instr = _instrPtr;       // set our iterator to the beginning
    while (_instr)
    {
        switch(_instr->Code())
        {
        case op_talk:
            std::cout << "I am talking." << std::endl;
            ++_instr;         // iterate
            break;
        case op_print:
            std::cout << _instr->Data() << std::endl;    // print data
            ++_instr;         // iterate
            break;
        case op_end:
            _instr = 0;       // discontinue the loop
            break;
        }
    }
}

It would be a good idea to make sure things work correctly. In our main source, we will test the new instruction. All we need is some data to print, which we then pass to the printing instruction's constructor, along with its proper length (the string length + 1 for the terminating null character):

VirtualMachine vm;

// simulate some external data
char* buffer = "this is printed data";

// build the script
vector<Instruction> InstrList;
InstrList.push_back(Instruction(op_talk));  // talk still works the same way
InstrList.push_back(Instruction(op_print, buffer, strlen(buffer)+1));  // print
InstrList.push_back(Instruction(op_end));   // then end
Script script(InstrList);

// load the script and save the id
size_t scriptID = vm.Load(script);

// execute the script by its id
vm.Execute(scriptID);

If all is in working order, this code should talk, and then print the message provided by the data.

2.3 Another Form of Data #

Data paired with an instruction is all well and good for allowing flexibility on a per-instruction basis. But what about flexibility between instructions? In order to achieve this, we need data that is accessible by all instructions, for reading and possibly writing. This data is therefore reasonably placed at the level of a running script.

The ownership of this data should be dealt with carefully. Unlike an Instruction's data, we would like this new data to be write-able in addition to being readable. If the ownership is carelessly placed at the hands of a script, then issues may arise when trying to enhance the features your system is capable of, such as when implementing some type of pseudo-multi-processing (parallel execution of scripts). This is because any changes to the script data in one "process" will affect any other "processes" running this same script.

For this reason, we would like to abstract a script's executional state. If and when we do implement such a feature, we can safely create executional states for each process being run. This script state will own the variable data we'd like to use, while the script itself will merely store a count describing how much data it needs when executing. The script state should also include some utilities for manipulating this data, otherwise what's the point of having it?

Our class may look something like this:

// a script's executional state
class ScriptState
{
public:
    // initialization
    void SetDataSize(size_t varCount)   { _varData.resize(varCount); }

    // data access
    void SetVar(size_t i, char val) { _varData[i] = val; }
    char GetVar(size_t i) const     { return _varData[i]; }
    const std::vector<char>& DataArray() const  { return _varData; }
private:
    std::vector<char>   _varData;
};

For current demonstrative purposes, char variables will be sufficient. Variables can be set or retrieved by index. If you'd like, you can even retrieve the data in a semi-string form. Keep in mind that it isn't necessarily null-terminated, however.

An aside regarding organization: At the moment, all of our classes are residing at the same namespace level. While this is ok for the limited number of classes we're working with, the organization could be improved somewhat, possibly through nesting. Instruction would make the most sense nested in Script, with Script and ScriptState nested in VirtualMachine. This is something to keep in mind, and I may make this organizational change in the future.

Now, to make use of this in our VirtualMachine class, we will simply add a ScriptState as a data member. At the moment, since we aren't dealing with parallel executions of scripts, we can get away with this. Later, when implementing this parallel script execution, we will have to relocate this member.

For now, to make use of it, we simply initialize its data size at the start of execution:

void VirtualMachine::Execute(size_t scriptId)
{
    SelectScript(scriptId);  // select our _instrPtr by script ID

    // initialize variable data
    _curState.SetDataSize(_scriptPtr->VarCount());

    _instr = _instrPtr;      // set our iterator to the beginning
    ...
}

2.4 A Helpful Tool #

Before we go on to make any new instructions to play around with this variable data, we should take care of one minor, yet very crucial thing. As anyone who has ever had to debug his or her code should know, the debugging process can be a real pain. Utilities to aid in debugging can help a great deal, so we should definitely have a utility built to view the data values stored in a ScriptState at any given time.

Something like this should suffice for now:

void ExposeVariableState(const ScriptState& state) const
{
    std::vector<char>::const_iterator itr;
    int n = 0;  // used to denote indexed position of value
    for (itr = state.DataArray().begin(); itr != state.DataArray().end(); ++itr, ++n)
    {
        std::cout << n << ": ";
        std::cout << static_cast<int>(*itr);   // cast for numeric value
        std::cout << std::endl;
    }
}

Little things like these can save you a lot of trouble later on when you just can't seem to get a script to work correctly.

2.5 Manipulating the Data #

Now let's add some pretty basic instructions just to prove that we can manipulate this data predictably.

op_set, // char, char : destination index, value to set
op_inc, // char : index to increment
op_dec, // char : index to decrement
op_add, // char, char, char : dest index, srce index1, srce index2

The commenting here describes the instructional data format, followed by a description of what each value represents to the instruction. For instance, the set op will set the variable at the specified index to the specified value, while the add op will set the variable at the destination index to the result of adding the values at source indices 1 and 2.

We are ready to add proper handlers for these opcodes in the virtual machine:

...
case op_set:
    _curState.SetVar(_instr->Data()[0], _instr->Data()[1]);
    ++_instr;
    break;
case op_inc:
    _curState.SetVar(_instr->Data()[0], _curState.GetVar(_instr->Data()[0])+1);
    ++_instr;
    break;
case op_dec:
    _curState.SetVar(_instr->Data()[0], _curState.GetVar(_instr->Data()[0])-1);
    ++_instr;
    break;
case op_add:
    _curState.SetVar(_instr->Data()[0],
                     _curState.GetVar(_instr->Data()[1])
                     + _curState.GetVar(_instr->Data()[2]));
    ++_instr;
    break;
...

If you trace through each handler very carefully, you will see that, although a bit circuitous, each instruction is handled as we have described. Due to the circuitous nature of these handlers, they are certainly not optimized to their fullest extent. This is partially due to not having direct write-access to the ScriptState's data. At the moment however, individual instruction handlers are not critical, as they are merely a filler to make sure the key-components of the virtual machine system are operating. You will certainly want to rewrite these later on. Right now we are more concerned with the design of the overall system, and using efficient methods that do not deal directly with handlers.

2.6 Another Little Test #

We will test this out with another little script. Lacking any creativity at the moment, you may simply put a few pseudo-random manipulation instructions into the script. We will use 4 variables, set the first 3 to a value of 7, then increment the 2nd variable (index 1), decrement the 3rd (index 2), and finally add the 1st and 3rd variables, placing the result in the 4th slot (index 3).

It should resemble the following, in opcode-with-data format:

set 0, 7
set 1, 7
set 2, 7
inc 1
dec 2
add 3, 1, 2

With this deterministic script, we are able to predict the final states of each of the 4 variables. If you follow closely, you will see that they should be as follows, in index-value format:

0: 7
1: 8
2: 6
3: 13

So let's try out our enhancements with the virtual machine. We will create a second script, load it into the machine, and then execute it using the ID returned from loading. In addition, we will use our new debugging tool to check out the variable states after execution.

To create the instructions for this script, we are going to need to simulate some external data (as was done for the previous data example) for reading into the proper instructions:

// create variable manipulation data
char setData1[] = {0, 7}; char setData2[] = {1, 7}; char setData3[] = {2, 7};
char incData = 1;
char decData = 2;
char addData[] = {3, 0, 2};// add 1st and 3rd var, and store in 4th

// proper instruction data size constants (temporary for safety)
const int SET_SIZE  = 2*sizeof(char);
const int INC_SIZE  = sizeof(char);
const int DEC_SIZE  = sizeof(char);
const int ADD_SIZE  = 3*sizeof(char);

Loading the data looks something like this. Notice that we have to use a different syntax for passing single chars than for passing char arrays:

// build the variable manipulation script
vector<Instruction> varInstrList;
varInstrList.push_back(Instruction(op_set, setData1, SET_SIZE));   // set first 3 vars to 7
varInstrList.push_back(Instruction(op_set, setData2, SET_SIZE));
varInstrList.push_back(Instruction(op_set, setData3, SET_SIZE));
varInstrList.push_back(Instruction(op_inc, &incData, INC_SIZE));   // inc 2nd var
varInstrList.push_back(Instruction(op_dec, &decData, DEC_SIZE));   // dec 3rd var
varInstrList.push_back(Instruction(op_add, addData, ADD_SIZE));
varInstrList.push_back(Instruction(op_end));                       // then end

Finish by passing the instruction list, and our variable requirement. Then we can load and execute the script:

Script varScript(varInstrList, 4);  // we need 4 variables

size_t varManipID = vm.Load(varScript);

vm.Execute(varManipID);
// check out the variable states
vm.ShowVariableState();

If all goes well, you should see the correct pre-mentioned values at appropriate indices.

2.7 Reflecting #

If our testing methods are beginning to seem like glorious hacks to you, you're probably right. Things are beginning to get messy in main(). We seem to be following sloppy, if not outright dangerous, practices to properly load the necessary data into particular instructions. What we are lacking is a centralized procedure for the proper handling and loading of instructions and their data.

While all of this may be fine for our small examples right now, if we are ever to go into larger things, we certainly want the centralization described to localize all possible bugs to one section of the code. That way, if we find we screwed up somewhere, we know exactly where to look while debugging. If you've not heard this before, the idea of localizing functionality is certainly something that is applicable in most, if not all, programming practices.

A mechanism to handle loading in a localized manner is definitely needed soon.

2.8 Conclusion #

Quite a bit was covered in this article, even though the underlying concept was pretty simple. As basic as it may seem, the inclusion of data increases the flexibility of our instructions a great deal. What would have required hordes of different instructions now requires only a small handful, with some additional data. The virtual machine is also now capable of retaining some kind of "state" during execution, which definitely has beneficial consequences.

At this point, there is a lot of metaphorical territory to be explored on your own. As easy as it may have seemed, we already have laid out much of our foundation. There is a lot to be discovered, and the possibilities are quickly becoming endless.

I am not yet exactly sure what I will be covering in the next article, though it will still be in accordance with my original outline. I am open to suggestions. Please make use of the forum discussion, or email me: Mglr9940@rit.edu



Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2010-10-28 12:42:52
Processing time 0.7296 sec