Larry Isaacs
For those who want to experiment with the machine, write utility aids, or just tinker around...
This article will present information on how ATARI BASIC stores programs in memory. If you are new to the field of microcomputer programming, this information should help increase your awareness of what your ATARI is doing.
The following information is based solely on what I have been able to observe while working with an ATARI 800. I believe the information to be accurate. However, it is hard to know how complete the information is.
Also, for those new to microcomputer programming, the next section gives some preliminary information which should help make the rest of the article more understandable.
Preliminary Information
One very important term in the field of microcomputing is the term "byte." For purposes of this article, it can be considered a number which can have a value ranging from 0 to 255. The memory in your ATARI consists of groups of bytes, each byte of which can be referenced by a unique address. The part of memory which is changeable, called RAM, starts with a byte at address 0 and continues with bytes at increasing sequential addresses until the top of RAM is reached. The top of RAM is determined by the type and number of memory modules you have in your ATARI.
Bytes, or combinations of bytes, can be used to represent anything you want. Some common uses for bytes include representing memory addresses, characters, numbers, and instructions for the CPU in your ATARI. You will be exposed to several different uses for bytes in this article. Some of these uses will make reference to two byte binary numbers. This is where two bytes are used to represent a number whose value ranges from 0 to 65535. The decimal value of a two byte binary number can be computed using the formula: FIRST BYTE+(SECOND BYTE *256).
Also in this article, reference will be made to page zero. Page zero simply is the first 256 bytes of memory, i.e. addresses 0 through 255. This part of memory differs from the rest of memory in that these bytes can be referenced using a single byte address. The rest of memory requires two byte addresses.
The Conversion
After typing in a BASIC line, hitting RETURN causes the line to be passed to the programs found in the ATARI BASIC cartridge. Here the line will undergo a certain amount of conversion before it is stored in memory. One part of this conversion involves converting all of the BASIC reserved words and symbols to a one byte number called a token.
Another part of the conversion involves replacing each variable name in the line with an assigned number which will range from 128 to 255. If a variable name has been previously used, it will be replaced by the number previously assigned. If it hasn't been used before, it will be assigned the lowest unused number, starting with 128 for the first variable name. Also, numbers in the BASIC line must be converted into the form which the ATARI BASIC uses before they can be stored in memory.
After the conversion is finished, the line is stored in memory. If the BASIC line does not have a line number, it will be stored after the last statement of your BASIC program, and executed immediately. If it does contain a line number, the converted line will be inserted in the proper place in your program. After the line has been executed or stored, your ATARI will wait for you to type in another line. Even though the line undergoes this conversion, the order in which the reserved words, variables, and symbols occur in the line isn't changed when it is stored in memory.
The Memory Format For A Basic Line
Let's begin with the general format of how a BASIC line is stored. Once a BASIC line has been converted and stored, the line number is found in the first two bytes of the memory containing the BASIC line. These bytes form a two byte binary number which has the value of the line number. The value of this number can range from o to 32767.
The third byte contains the total number of bytes in this BASIC line. This means you can find the first byte of the next line using the following formula: ADDRESS OF FIRST BYTE OF NEXT LINE = ADDRESS OF FIRST BYTE OF CURRENT LINE + NUMBER IN THIRD BYTE OF CURRENT LINE.
The fourth byte contains the number of bytes in the first statement in the line, including the first four bytes. If the BASIC line contained only one statement, the third and fourth bytes will contain the same value. If the line had more than one statement, these bytes will be different.
Next come the bytes which represent the first statement in the line. If there is more than one statement, the next byte following the first statement contains the number of bytes in the first two statements. Naturally, if there is another statement after the second one, the first byte after the end of the second statement contains the number of bytes in the first three statements, etc.
This completes the format of a BASIC line as it is found in memory. Before going on, let's put this information to use in a short program which lists out its own line numbers along with the beginning address of each line. To do this we must first find out where the first byte of the first line is found. It turns out there is a two byte binary number found in page zero which contains the beginning address of the first line. This number is contained in bytes 136 and 137. Also, we will know when we've reached the end of the program when we find a line number of 32768, which is one more than the maximum allowed by ATARI BASIC. The program to print the line numbers and their beginning addresses is shown in Listing 1.
Listing 1: Program to print line numbers and their addresses
Download (Saved BASIC)
Download / View (Listed BASIC)
Tokens
In order to conserve memory, all of the BASIC reserved words, operators, and various punctuation symbols are converted into a one byte number called a token. This conversion also makes execution simpler and faster. The tokens can be divided into two groups. One group contains the tokens which occur only at the beginning of a BASIC statement and the other group contains the tokens which occur elsewhere in a BASIC statement.
Let's first take a look at the tokens which occur at the beginning of a BASIC statement. It turns out that all statements will begin with one of these tokens. After some investigation, I found that these tokens will range in value from 0 to 54.
The procedure for listing the tokens is fairly simple, though the actual implementation is a bit more involved than the brief explanation which follows. The idea is to put "1 REM" as the first statement of the program. Then use POKEs to change the line number and token of this REM statement. By setting the line number and token to the same number, listing the line will print the token and corresponding BASIC reserved word. Fortunately the programs in the BASIC cartridge which do the listing tolerate the incomplete BASIC statements. The program for displaying these tokens is shown in Listing 2. Notice when you run this program, no reserved word is printed for token 54. This is the invisible LET token which is used for assignment statements which don't begin with LET.
Listing 2: Program to print the tokens which begin BASIC statements
Download (Saved BASIC)
Download / View (Listed BASIC)
A similar procedure can be used to list the other tokens as well. The main differences are to make the first statement "1 REM A", POKE 54 (the invisible LET token) into the first byte of the statement, and make the changes for the token to the second byte of the statement. The values for the tokens which occur after the beginning of a statement range from 20 to 84. The program for printing these tokens is given in Listing 3.
Listing 3: Program to print the tokens which don't begin BASIC statements
Download (Saved BASIC)
Download / View (Listed BASIC)
After running this program, you will notice there is no reserved word or symbol printed for token 22. Token 22 is the terminator token found at the end of each BASIC line, except those whose last statement is a REM or DATA statement. Also, tokens 56 and 57 didn't print a reserved word or symbol. Both of these tokens represent the "(" symbol. The "(" doesn't print because these two tokens are associated with array names, and the "(" symbol is kept with the associated variable name, as will be seen in the next section.
Of course you noticed that most of the symbols occur more than once. There is a different token for each of the different uses of the symbol. For example, the word "=,' has four different tokens. Token 45 calls for an arithmetic assignment operation as in A= A+ 1. Token 46 calls for a string assignment as in A$= "ABC". Token 34 is used in arithmetic testing as in IF A= I THEN STOP. And finally, token 52 is the same as token 34 except that it's for testing strings.
One more token, found after the ones listed in the previous program: token 14, which indicates a constant is stored in a following six-byte grouping.
Variable Names And Constants
As each new variable is encountered, it is assigned a number. These numbers begin with 128 and are assigned sequentially up to 255. Notice these numbers will fit into one byte. Also, as each new variable is encountered, the variable name is added to a variable name list, and 8 bytes of memory are reserved for that variable. In the case of undimensioned variables, these 8 bytes will contain the value of the variable. For strings and arrays, these 8 bytes will contain parameters, with the actual values and characters stored elsewhere.
This method of handling variables has some advantages. One advantage is that it keeps usage to a minimum. The variable name is only stored once, and each time that name is referenced in a BASIC statement, it occupies only one byte in the stored program. Another advantage is that the address where the value for a variable is stored can be computed from the assigned number. This isn't true of the BASIC found in some other microcomputers where values must be searched for.
There are also some disadvantages as well. First, it limits you to 128 different variable names. However, the great majority of programs won't need more than 128 variable names. One other disadvantage is that, should a variable name be no longer needed, or accidentally entered due to a typo, there is no quick way to remove that variable from the variable name list and reuse the 8 bytes reserved for it.
Apparently, the only way to get rid of unwanted variables is to LIST the program to cassette or disk. For example, LIST "C" will list the program to cassette. Once the program is saved, use the NEW command to clear the old program. Then use the ENTER command to reload the program. For cassette this would be ENTER "C." Using the LIST command saves the program in character form. ENTERing the program then causes each line to be converted again as was done when you first typed it in. Now only the variables found in the program will be placed in the variable name list, and space reserved for their value. Using CSAVE and CLOAD won't do this because these save and load a copy of the memory where the program is stored. Unwanted variables are saved and loaded with the rest of the program.
Constants are stored in the BASIC statements along with the rest of the line. The constant will be preceded by a "14" token as mentioned previously. Explaining how ATARI BASIC represents the numbers used as constants and as variable values will require some explanation about BCD (Binary Coded Decimal) numbers. I will save this information for a later article.
To give an example of using the information in this section, let's take a look at the variable name list. Fortunately bytes 130 and 131 contain the address of the beginning of the variable name list. The list will consist of a string of characters, each character occupying one byte of memory. To indicate the last character of a name, ATARI BASIC adds 128 to the value representing that character. Since the values representing the characters won't exceed 127, the new value will still fit into one byte. To indicate the end of the list, a 0 is placed in the byte following the last character of the last name. The program which prints the variable name list is given in Listing 4. Notice, when you run this program, that the "(" is saved as part of an array name, and the "$" as part of a string name.
Listing 4: Program to print the variable name list
Download (Saved BASIC)
Download / View (Listed BASIC)
Memory Organization
Finally, let's look at how the memory is organized for a BASIC program. The order in which the various parts of a program are found in memory is shown in Figure 1. The only part whose beginning is fixed is the variable name list which begins at address 2048. The beginning of the other parts will move appropriately, as the program grows. There are addresses in page zero which can be used to find each of the parts shown in Figure 1. These addresses, usually called pointers, are shown in Table 1. This table includes the two pointers which were used in the previous programs.
Figure 1. MEMORY ORGANIZATION
Increasing Addresses ???? End of Array Storage Area . ???? Beginning of Array Storage Area ???? End of Program . ???? Beginning of Program ???? End of Variable Storage Area . ???? Beginning of Variable Storage Area ???? End of Variable Name List . 2048 Beginning of Variable Name List
TABLE 1
ADDRESSES NAME CONTENTS POINT TO 130 & 131 BON Beginning Of variable Names list 132 & 133 EON End Of variable Names list 134 & 135 BOV Beginning Of Variable storage area 136 & 137 BOP Beginning Of Program 138 & 139 CEL Beginning Of Currently Executing Line 140 & 141 BOA Beginning Of Array storage area 142 & 143 EOA End of Array storage area
Application
For those who are interested in putting this information to use, I will present one example here. I will try to give more examples in future issues of COMPUTE!.
At some time you may find it useful to be able to "undimension" some arrays of strings and reuse the memory for some other arrays and strings. It turns out that the CLR function only clears the variables found between the BOV (Beginning Of Variables) pointer and the BOP (Beginning Of Program) pointer. By temporarily changing the BOP pointer, we can keep some of the variables from being cleared. The array storage area is cleared by setting the EOA (End Of Arrays) pointer equal to the BOA (Beginning Of Arrays) pointer. We can save some of the array storage area by temporarily changing the BOA pointer.
The listing for this UNDIMENSION routine is shown in Listing 5. The listing also includes a small demo program to illustrate its use. Note that all of the names of variables which are to be cleared should occur in the program prior to any of the names of variables which are to be saved. This puts the storage for the variables to be cleared at the beginning of the variable storage area. Also note that a dummy string which can be cleared is needed by the UNDIMENSION routine. In your main program, this dummy string should be dimensioned just before dimensioning the strings and arrays that you will later clear, as was done in statements 120 and 150. This allows the use of the ADR functions to find the end of the array area to be saved.
The reason the UNDIMENSION routine is not executed by a GOSUB is that the return line number is lost in the clearing process. Loop parameters will also he lost, so the routine shouldn't be executed while in a FOR..NEXT loop.
Listing 5: Undimension routine
Download (Saved BASIC)
Download / View (Listed BASIC)
Conclusion
I hope that you found the information in this article understandable and will find it useful at some point in the future. The information does show that ATARI BASIC is fairly efficient at using memory to store programs. Also, there is very little penalty in memory usage when using long variable names. If you have any questions please send them to COMPUTE!.
Return to Table of Contents | Previous Section | Next Section