Beyond the Basics


Atari BASIC Structure

W. A. Bell

By now you probably have had your Atari® Computer for a few months, and have had a chance to put in some fairly large programs and tinker with and embellish them. You may have even written some programs of that type. If so, then you have undoubtedly wished for a renumber command. In fact, if you have used BASIC on other systems, then you have probably roundly cursed those programmers who left that facility out. Or you may have wanted to change the name of a variable to make it more self-documenting, but didn't know everywhere it occurred. This article will explore, in tutorial fashion, the structure of Atari BASIC programs as they are stored in memory. It will provide you some tools for doing more of your own exploring, and then show how you can put this type of information to use.

To begin our exploration inside BASIC, the program shown in Listing 1 is useful. It lets us peek around in memory to find things that are of interest. It will search memory from a specified starting address and tell you where it finds a string of characters or data you have specified, or it will find address pointers to a specified memory location. It will also let you dump memory in two formats, decimal or hexadecimal, and character. If your Atari is plugged in, it may help your understanding to follow along on your keyboard.

Do the following steps in direct mode:

NEW
TESTVAR1=999
TESTVAR2=123456
TESTVAR3=98765432

Now enter the memory analysis utility program in Listing 1 (you may want to save it for future investigations). As an initial objective, let's try to find the following:

Let's start our search by seeing if we can find where the actual lines of the program are stored in memory. To do that, we RUN the memory analysis utility program, and request that it find the character string in the first REM statement (Line 10). To do that specify "5" for function required and enter the character search mode by responding with a "C". Then enter the character string "MEMORY ANALYSIS UTILITY." Be sure to request the dump in decimal this time. After the appropriate pause, a match should be found at address 2264 and you should see the first lines of comment.

At this point it should be explained that the article assumes throughout that you have a system without disk. For those of you with disk systems most of the addresses will be different, and there may be some variation in some of the commands, but the fundamental concepts remain the same. If you have trouble reproducing these results with a cassette system, it probably is because of differences in the sequence in which the program was entered, or errors in variable names. To resolve this you can do a LIST "C, a NEW, enter the variables again in direct mode, and do an ENTER "C.

Examining this more carefully, you will note that there are a few bytes in between the comments of each of the REM lines. After some study, you may note that the line numbers appear to start five bytes before each comment, at addresses 2259, 2288, etc. At this point you may wish to request another search, again with a decimal dump, looking for the character string "A DUMMY LINE" as listed in Line 256. The search will find a match at address 2847, and you will find that the value at address 2342 is now zero, but the next byte now has a value of one, where it previously was always zero. In fact the line number occupies two bytes, with the low order byte containing the low order bits, and the higher byte containing the high order eight bits. Thus the line number is 256 times the second byte plus the first byte, or 256*1+0=256. All binary 16-bit numbers in the Atari (and most 6502 processors) are stored in this fashion, including addresses. You may want to study lines 650 through 700 of Listing 1 to see how this type of number is manipulated.

To understand a little more of how this structure is laid out, try adding the following line to Listing 1.

1 REM

Now request the dump function starting at address 2259. You will see that we now have Line Number one, followed by five bytes, and then Line Number 10. Looking at the Line one dump, we see the first two bytes represent the line number, while the next two bytes contain the value six. Byte number five contains a zero, and byte number six contains a 155, which from Appendix C of the Atari Basic Reference Manual is a RETURN or EOL character. You will note that the rest of the REM statements follow a similar format.

In fact we can now deduce that the third byte gives the length of the lines in bytes and by adding that to the address of the present line, we can find the next line. (Let's reserve study of the fourth byte until later). Similarly we can deduce that the fifth byte contains the equivalent of an opcode for the REM statement, while the EOL character signifies the end of the character string following the REM. This also conforms to the information in Chapter 11 of the BASIC Reference Manual under Item 2, where it states that each logical line requires six bytes of overhead.

With these facts in hand, let's leave the subject of BASIC statements for a moment, and see what we can observe about the other things we want to find.

Note that the second and third items are alluded to in the BASIC Reference Manual in Chapter 11, Item 3. The statement is made that a variable takes eight bytes plus the number of characters in the variable name the first time it is used, but that each subsequent reference takes only one byte. Thus the variable name and value cannot be stored in the BASIC statement.

Let's start the search for variable names by looking for the variable TESTVAR1 that we entered before we keyed in Listing 1. After typing RUN, specify a string search for the characters "TESTVAR." With an appropriate wait for the computer to find it, it should respond with an address of 2048 (decimal), and a dump of the surrounding area.

Examining the dump received, you will see the characters TESTVAR1 starting at the indicated address. However, note that the last character is in inverse video, or more precisely, that the high bit of the last character in the name has been set to a one. Following TESTVAR1, you will see the variable names TESTVAR2 and TESTVAR3, each with the last character in inverse video. You will also see the variables used in the program displayed in the same manner, each with the last character in inverse video.

Now specify an address pointer search for the address where the variable name table was found (2048), In this case several will probably be found, but the one of interest is the one found on memory Page 0 at address 130 and 131. (For those of you not familiar with the 6502 architecture and the significance of Page 0, you may want to refer to one of the excellent references on this subject.) One more problem with the variable name table remains. Since it is of variable length, depending on how many variables have been defined, and the length of each variable name, how do we know where the table ends?

A little deductive reasoning is in order. Remember that variables can only contain alphanumeric characters. Thus any non-alphanumeric character could be used as a flag for the end of the variable table. Looking at a dump starting at 2048, sure enough after the variable BYTE0 we see the value 0 (address 2122). Now doing an address pointer search for address 2122, we find such a pointer at 132 and 133 on memory Page 0. We can also do a search for an address pointer to the beginning of the program lines by specifying a search for an address pointer to address 2259 where we found the first line of the program. Again a reference will be found on Page 0, this time at address 136 and 137.

Let's review what we found so far. We have a variable name table stored from address 2048 to 2122, with a pointer to the beginning of the table stored at addresses 130 and 131, and a pointer to the end of the table at 132 and 133. We also have the program lines stored beginning at address 2259, and an address pointer at 136 and 137. So what do you suppose is stored in between the end of the variable name table and the beginning of the program lines?

To find out, let's do a dump starting with the byte after the end of the variable name table, or address 2123, in decimal. After doing so, nothing much jumps out at you - right! So let's try a dump in hex starting at the same address. This time, with some study you will find in order the hex characters 09 99, 12 34 56, and 98 76 54 32 interspersed with other data. Looks like we may have found the variable value table, doesn't it?

Let's study this dump a little closer. Looking at the other bytes, and remembering what Chapter 11 said about 8 bytes per variable, study the value of TESTVAR1. What you should see is:

00 00 41 09 99 00 00 00

Similarly for TESTVAR2 and TESTVAR3 we see:

00 01 42 12 34 56 00 00 and
00 02 43 98 76 54 32 00

Thus the structure of the variable value is such that it is stored in binary coded decimal (BCD) as a floating point number. The digits are stored left-justified in bytes four through eight of the 8-byte block, with the exponent stored in byte three. The exponent is defined such that for numbers greater than one, the exponent is from hex 40 to hex 7F, while for numbers less than one it will have a value from 00 to 3F. For negative numbers the high order bit will be set to one, or the exponent will range from 80 to FF. At this point you may want to end the dump program, change line 50 to assign a different set of values to the three variables, and then run a dump of this same area to see the changes.

Now that you have convinced yourself of the way numbers are stored, we still have a mystery or two to solve. What about byte two? Suppose that might be the variable number? Remember the statement in Chapter 11 about how additional references of a variable only take one byte. Seems that the only way to do that would be to assign a variable number. Also note that you are allowed a maximum of 127 different variables in a given BASIC program (see Chapter 1 of the Reference Manual). So the deduction that byte two of the 8-byte block is the variable number seems logical. Furthermore it gives a method of finding the variable name for such purposes as listing the program or operating in the direct mode.

Let's leave the use of the high order bit of byte two and the use of byte one of the 8-byte block to your investigation, with a couple of hints. Try examining the variables A$, B$ and HEX$. You may also want to define a numeric array in the direct mode and assign a set of values to it, and then dump its 8-byte block. One final step in this investigation is to try to find an address pointer to the variable value table. Specify a pointer to the address 2123, and we find that such an address pointer exists at 134 and 135 on Page 0 of memory.

Let's stop and summarize what we have learned at this point. FIGURE 1 is a visual depiction of the layout in memory of the address pointers on memory Page 0, the variable name table, the variable value table, and the program storage area.

At this point let's set our objective to create a full featured renumber utility. We have the fundamental information regarding memory layout and usage. The only additional data needed is to determine how line numbers are used in a program line. To investigate this, LISTING 2 has been developed. You can enter it at this point, either clearing the old program out, or leaving it at your option (if you have adequate memory).

The program in Listing 2 has been designed to let us dump a specific BASIC line. It will give us a decimal, hex, and character dump of any line we want. To digress for a moment, what we will get is a picture of the tokenized version of the BASIC line. This is the form used to store a program in the save mode. The list mode on the other hand stores the program just as you see it when you do a list to the screen or printer. Also note that a save operation will save the variable name table and the variable value table as well.

The intention is to decipher the internal structure of a BASIC line; since we want to generate a renumber utility, more specifically we want to see what those lines with line number references look like. Let's start with one of the most common line referencing statements, the GOTO. When the program in Listing 2 has been entered, add the line

10 GOTO 10

Then in direct mode type

GOTO 20000

Now request that the program find and dump Line 10. What you will see as a dump is:

DEC10013131014641600
HEX0A000D0D0A0E40100000
DEC0022
HEX000016

Now change Line 10 to read

10 GOTO 123456

and with another GOTO 20000, the dump will read:

DEC1001313101466185286
HEX0A000D0D0A0E42123456
DEC0022
HEX000016

From the change that takes place, it is obvious that the referenced line number is stored in bytes seven through 12 of the line. Not only that, but also it is stored in exactly the same format as variable values are stored. You may want to try a few other values for the referenced line number to convince yourself.

We can also speculate that the opcode for the GOTO must be either byte five or byte six, or a combination of the two. Now let's see how BASIC lines with multiple statements are formatted. Again modify Line 10 as follows:

10 GOTO 999:GOTO 999:GOTO 999

and doing a GOTO 20000 we get the following dump:

DEC100331310146591530
HEX0A00210D0A0E41099900
DEC00202310146591530
HEX000014170A0E41099900
DEC00203310146591530
HEX000014210A0E41099900
DEC0022
HEX000016

From this we can conclude that bytes four, 13 and 23 are used to describe the length of a given statement in the line. More precisely, they are used to give the offset from the address of the line number to the next statement, and the last of these in a multi-statement line will always be the same as byte three of the line.

At this point we need to establish what statements use line number references. After studying the BASIC Reference Manual, the following types of statements can have a line number reference:

GOTO   GOSUB     ON () GOTO   ON () GOSUB      TRAP
LIST   RESTORE   IF () THEN   IF () THEN GOTO 
IF () THEN GOSUB

Taking each of these statements in order (entering the line number as shown, and then dumping it) we get the following results:

1GOTO 999
 DEC10131310146591530
 HEX01000D0D0A0E41090000
 DEC0022
 HEX000016
2GOSUB 999
 DEC20131312136591530
 HEX02000D0D0C0E41099900
 DEC0022
 HEX000016
3ON Z GOTO 997, 998,999
 DEC303131301332314659
 HEX03001F1F1E85170E4109
 DEC15100018146591520
 HEX97000000120E41099800
 DEC001814659153000
 HEX0000120E410999000000
4ON Z GOSUB 997, 998, 999
 DEC403131301332414659
 HEX04001F1F1E85180E4109
 DEC15100018146591520
 HEX97000000120E41099800
 DEC001814659153000
 HEX0000120E410999000000
5TRAP 999
 DEC50131313146591530
 HEX05000D0D0D0E41099900
 DEC0022
 HEX000016
6LIST 999
 DEC6013134146591530
 HEX0600D0D040E41099900
 DEC0022
 HEX000016
7RESTORE 999
 DEC70131335146591530
 HEX0600D0D040E41099900
 DEC0022
 HEX000016
8IF Z THEN 999
 DEC80151571332714659
 HEX0800F0F07851B0E4109
 DEC15300022
 HEX9900000016
9IF Z THEN GOTO 999
 DEC90177713327171014
 HEX090110707851B110A0E
 DEC65915300022
 HEX41099900000016
10IF Z THEN GOSUB 999
 DEC100177713327171214
 HEX0A0110707851B110A0E
 DEC65915300022
 HEX4109900000016

From these dumps we now deduce that all line number references are preceded by a having the decimal value 14. Furthermore, the byte preceding the byte with a value of 14 will have one of the following values if a line number reference follows:

OPCODE	STATEMENT
   4    LIST
  10    GOTO
  12    GOSUB
  13    TRAP
  18    ON () 2nd, 3rd, etc. line references
  23    ON () 1st line reference
  24    ON () GOSUB 1st line reference
  27    IF () THEN
  35    RESTORE

In fact, it appears that the actual usage of the value 14 in a BASIC statement is to indicate that a BCD floating point constant follows. To see this, you may want to reload the program in Listing 1 and search for the decimal value 14. You should find that any occurrences in the program storage area, aside from line or statement lengths, precede a numeric constant.

With this information in hand, we now know enough to construct a Renumber utility. The basic algorithm is as follows:

1 - Find each line number reference

2 - Find the line that is referenced, and count the number of lines from the beginning

3 - Compute what the new line number will be

4 - Store that value as the new referenced line number

5 - When all line references have been set to their new value then do the actual renumbering of lines.

There remains a sticky implementation problem, since line numbers are stored as floating point numbers. (Why this approach was chosen by Atari remains a mystery - a binary format would have required two bytes instead of six, and no internal conversion.) Listing 3 demonstrates one technique for solving this problem, using the variable value table we found earlier. In this case, the location of the value for a specific variable (REFLINE) is established. That variable is used to store the new referenced line number when it is computed. Then that value is POKEd into the location for the line number reference.

Other more elegant solutions, requiring fewer statements, are possible, but they generally require some additional exploration of the structure of BASIC. At this point you will probably want to study Listing 3 along with its comments, and then enter it into your Atari. You should also note that this implementation of a renumber utility is not capable of renumbering itself. One other limitation is that the program will not deal with situations where variables are used as the line number reference. In such cases, you will have to follow the computational routines used to set the value of the line number reference, and either alter them appropriately, or else restore those line numbers to their original value after renumber has done its thing.

So how is such a program used? After the program has been entered, ready the tape recorder and, in the direct mode, type;

LIST "C

This will store the renumber utility on tape in a form so that it can be merged with other programs already in memory. (A CSAVE would be advisable, just for backup purposes.) First CLOAD a program you want to test the utility on. When that has finished, position the tape at the location where you started the List "C, and type:

ENTER "C

When the renumber utility has been loaded, a list command will show that it has been merged in at the end of the program previously loaded.

Now type GOTO 3200 and watch the results. One more step of course, is saving the program once it has been renumbered. If you simply do a CSAVE, you will also store the renumber utility with your original program. To avoid doing that, (gobbling up all that precious memory, not to mention space on your tape) do the following:

LIST "C1",0,31999

Rewind the tape to where the list started and

ENTER "C

You now have just the original program in its renumbered form, and it can be CSAVEd in the conventional manner.

We have been able to develop a utility to renumber BASIC programs using the information we have uncovered. We have also found several techniques for conserving memory, such as not using the IF THEN GOTO statement, as it uses two more bytes than IF THEN. Using a variable will also save over using a constant if it is used more than twice. And, of course, every statement put into a multiple statement line saves three bytes. There are several other functions that could be implemented: such as changing variable names; finding all references to a given variable; the deletion of blocks of lines; and renumbering selected lines of a program. Some of these ideas require additional digging to find all of the data necessary; others can he implemented with the things we know at this point.

Two problems exist at this point. The first is that utilizes such as that in Listing 3 require a good deal of memory - a precious commodity for most of us. The second is that, for programs of any significant size, the use of such a utility will take a considerable period of time. A future article will take what has been developed to date and convert some of the more complex functions to machine language subroutines. These subroutines will be general purpose in nature, so that they can also be used in implementing some of the functions in the previous paragraph. Happy PEEKing!

FIGURE 1

Memory Layout for Atari Basic Tables

Memory Layout for Atari Basic Tables

Program 1: Memory analysis utility

Download (Saved BASIC)
Download / View (Listed BASIC)

Comments for Program 1

General: The underscore (_) is used to indicate that characters are to be entered in inverse video

LinesComments
60Required since a RUN command resets all variables to zero
90-190Determine the function to be performed
210-610Search memory for specified data
210-260Determine if data input as character or decimal
270-350Input of decimal data
360-380Input of character data
410Required to prevent match on BASIC input buffer
420-590Actual search of memory
490-540Match was found, dump memory at that point
630-750Search for an address pointer
650-660Convert to internal address format
680-730Conduct the search, noting that addresses are stored low order byte, then high order byte
770-890Dump specified area of memory
810-830Dump a full screen of memory
920-1280Subroutines
920-1150Subroutine to dump memory
950-1050Dump one line (10 bytes) in hex or decimal
980-1000Hex dump after converting to hex
1020-1040Decimal dump with appropriate spacing
1050-1130One line of character dump for same memory
1100-1110Check for cursor control characters and substitute inverse video space
1170-1180Subroutine to print patience message
1200-1250Subroutine to determine if dump is in hex or decimal
1270-1280Subroutine for input error

Program 2: Basic Line Dump Utility

Download / View (Listed BASIC)

Comments for Program 2

General: The underscore (_) is used to indicate that characters are to be entered in inverse video

LinesComments
20400Constants used in hex conversion
20500-21100Find the line the dump was requested for
20500Find starting address of first line
20700Compute line number of current line
21000Compute address of next line
21300-21400Set up to dump line
21500-23500Dump one screen of memory
21500Z is how many bytes to dump on this line
Y is vertical position on screen
MAXADR is start of next line
21700-23300Dump Z bytes of memory
22700-22800Dump byte in decimal
22900-23000Dump byte in hex
23100-23200Print character representation of byte - using POSITION avoids most of the problems with cursor movement except clear screen (Q=125)
23500Test for full screen of dump
23600-23800For lines that exceed a full screen
23900Check for end of line

Program 3: Renumber Utility

Download / View (Listed BASIC)

Comments for Program 3.

General: The underscore (_) is used to indicate that characters are to be entered in inverse video

The program requires 2319 bytes of memory in this form. To conserve memory, a number of lines could be deleted, eliminating some displays and error checking. These lines should be considered: 32095, 32180 through 32195, 32220, 32225, 32240, 32245, 32340 through 32350, and 32510. Smaller gains can also be made by converting the computation of line addresses and line numbers to subroutines, and by using shorter variable names.

LinesComments
32025-32110Find the address of the variable REFLINE, used to store the referenced line number
32030Beginning of the variable name table
32045-32055Is this the correct variable?
32070-32080Yes, compute the address in the variable value table
32090-32110No, search for the end of this variable (inverse video) and increment the variable number
32120-32165Initialize other variables
32120-32145Set up the array of opcodes which use line numbers
32170-32225Count the number of lines and check to make sure they are in ascending order
32235-32245Input the renumber parameters and check see if they will exceed the first line number of this program
32260-32460Find each line number reference, and replace with the new line number
32260-32280Compute address of line, line number, address of end of line, start of statement and end of statement
32285-32430Process each BASIC statement in the line
32290Test for a BCD constant
32300-32310Check for line referencing opcode
32325-32335Store referenced line number in variable REFLINE
32345Check for nonsense line numbers (just in case)
32355-32385Scan program to locate referenced line
32410-32425Referenced line found so compute what the new line number will be and store in line
32435-32460Check for end of line and update address pointers accordingly
32470-32505Now compute the new line number for each line and store in the first two bytes of the line


Return to Table of Contents | Previous Section | Next Section