Thursday, 10 August 2017

Xenophobia on the Virgin Express

I've heard a lot about the growing Brexit xenophobia being nurtured by the likes of the Daily Mail, The Sun and Express, but prior to our holiday trip to the lovely North East town of Seahouses a couple of weeks ago I'd not really witnessed any.

I mean we've seen the video of anti-immigrant or racist abuse on the Manchester Metrolink right?

So, let's get down to the specifics. On Saturday, July 22 we were on the (I believe) Virgin Intercity (though it might have been Cross County) to Newcastle from Birmingham New Street. I think we would be arriving at 14:46 and then catching the next train to Berwick upon Tweed.

Me and my wife had an allocated pair of aisle seats at a table as it happened and there was a couple in the window seats too. I was travelling backwards. Slightly behind me and on the opposite side window seat was a lady - clearly very pregnant - enjoying the ride.

That is, until the conductor came along and asked for her ticket. Instead she presented him with a printed A4 sheet with her booking details including a code for the ticket. She didn't know this wasn't a proper ticket, because she was from Hungary and hadn't been in the country that long.

Instead of being gracious and explaining how she could get her ticket, the ticket inspector started to get really bolshy with her; threatening her with a fine or throwing her off the train - but she didn't have any real money with her either, her partner was the one who had actually bought the ticket online.

So, I said to the inspector it was obviously an innocent mistake so couldn't she get it sorted out at the end of the journey? To which he spouted "They wouldn't allow that where she comes from!" and stormed off down the carriage like a true Daily Express reader!

Me and my wife did a double-take. So, did the couple next to us at the window side. Then between us we all did a double-double take. Then we chatted with the woman for a few minutes and explained how she should be able to sort it out when she gets off the train by going to Information.

And the post-script for any xenophobe blog readers: I've no idea if they'd immediately chuck you off a train in Hungary for only having the booking information instead of an actual ticket, but who cares? Why should we define our behaviour by the worst we might imagine about another country's? To me it just sounds like a mirror image of your own inner thoughts and you can't fix that by targeting Eastern Europeans.

Saturday, 1 April 2017

Sign up to Signed Timeouts

I'm Julian Skidmore, the developer of the DIY 8-bit computer FIGnition. Most of my career has been spent doing embedded software, and timeouts have become a major bugbear of mine.

As I see it, the typical timeout mechanisms used on pretty much every low-end embedded system I've seen are terrible and most people probably don't even realise they are, so let's swallow the red pill and embrace a new way of doing them.

Let's start with typical timeout parameters for low-end systems.

On 8-bit and 16-bit embedded systems 16-bit jiffy clocks are often used, because we still want to economise on resources even in the 21st century. They typically have a 1KHz frequency (a millisecond timer) which means they wrap around every 64s, and this has to be carefully considered when calculating timeouts.

On ARMs and the arduino library and undoubtably many other systems, 32-bit jiffy clocks are often used which extends the range to 4096000 seconds, which is roughly 1.5 months; and so for long-running applications; wrap-around problems apply to them as well.

The Evolution Of Timeouts

Embedded programmers usually approach timeouts in a progressive way, from naive implementations to ones that make better use of system resources over time. A programmer might start off by implementing countdown timeouts in the polled part of the mainloop.

However, they then find out that the main loop isn't polled deterministically, so they move the countdowns to an interrupt routine, which does the same thing.

A timeout of this kind is a form of Signal. The timeout starts at 0, which signals to the consumer (the main loop) that a timeout has taken place (or not yet started) and to the producer (the interrupt routine), that it can't decrement it. The consumer then sets the timeout to the number of jiffy ticks desired and this signals to the consumer that it's not going to write to it until it's complete, and signals to the producer that it should decrement the countdown until it gets to 0 (so now the producer has possession of the countdown). The consumer then monitors the countdown until it reaches 0; whereupon it regains possession.

These forms of timeouts have two main strong points. Firstly, they're really simple to implement (just an assignment, a decrement, and a couple of if statements), and secondly, they have an indefinite polling window (the consumer can poll a 1ms countdown after an hour and it'll still report that it's complete). Unfortunately, they soak up CPU time for every countdown timer, even if the timer isn't running and they make it harder to write modular code (because you have to hardcode in the countdown to the interrupt, unless you use a data structure for your set of countdowns).

So, programmers then generally move onto timeouts based on simple comparisons. For example, they'll set the timeout expiry to the current time in milliseconds and then keep polling the milliseconds to see if it's passed the timeout:

... // The timeout is set up,


... // then later in a main loop, we keep testing it:

if(millis()>=gMyTimeout) {
    // The timeout has expired.

The first timeout code is always constructed like this, because we see time as being linear. But of course it's buggy, because eventually millis() will get to its maximum, or near it, and when you add aPeriod, it wraps around to 0, so the first test for millis()>=gMyTimeout will think it's expired when it hasn't.

So, the programmer does the logical thing and incorporates the 'wrapping' issue into the algorithm by adding a flag to determine if the timeout wrapped around. Then we get something like:


if(millis()>=gMyTimeout && gMyTimeoutWrap==FALSE ||

    millis()<=gMyTimeout && gMyTimeoutWrap==TRUE) {
    // The Timeout flag has expired.

Except this doesn't work properly either, because if, for example, gMyTimeout is close to the maximum value, then there's only a small window where millis()>=gMyTimeout even though there was no wrap and similarly, in the wrapping case, there's only a small window where millis()<=gMyTimeout.

There are similar problems if, for example we just have the first test, and a different test for when the timeout wrap is TRUE, e.g:

if(gMyTimeoutWrap==TRUE) {

        gMyTimeoutWrap=FALSE; // we've wrapped!
else if(millis()>=gMyTimeout){
   // We've expired!

The code must be polled often enough for the condition to be met to reset the wrap flag.

A this point, assuming that the embedded programmer actually becomes aware of these issues, they'll realise that a simple concept like a timeout is far more complex than they'd imagined.

They may then try another common pattern. Instead of adding aPeriod to the current time to obtain the timeout time, both the aPeriod and the original time are maintained. We then compute a timeout as follows:


.../// and in the main loop
if(millis()-gMyTimeout>=gMyTimeoutPeriod) {
  .. // We've timed out.

In this code we solve the major issues with the previous code: by subtracting the initial timeout set time from the current time we end up calculating the relative time since gMyTimeout. It will always start out at 0 and will eventually become >= gMyTimeoutPeriod. At last we have code that isn't actually buggy!

But it's poor, very poor, because the timeout must be polled every gMyTimeoutPeriod milliseconds or the window can be missed. For example, a 1ms period means the polling must take place within a millisecond or it will look like the timeout won't expire for another 65535ms.

We ought to find a better way, and this is how:

GROT: The Golden Rules Of Timeouts

The First Golden Rule Of Timeouts is that for any circular jiffy time of n bits you always need at least n+1 bits of storage to maintain a timeout.

This is true in the first case, we had n-bits for millis() and the flag represented an extra bit of storage (assuming we managed to correct the issue). It's also true for the second case; we had n-bits for millis() and since both a timeout and the aPeriod were represented, we needed 2n bits of storage.

This rule is a direct consequence of the requirement we must have for timeouts, if a jiffy clock has a period of x before wrapping, then we need to be able to represent a time period of up to 2*x to determine a timeout:

This also means that because we have at least n+1 bits of storage for the timeout, we must perform at least n+1 bit arithmetic to calculate timeouts. Again, we can see that's true in both cases: in the first case we have some explicit code to extend the arithmetic by an extra bit and in the second case we must perform an initial subtraction (millis()-gMyTimeout) and then another subtraction will be performed for the comparison.

The second golden rule is this: timeouts must be computed as relative quantities if the jiffy clock wraps around. The primary reason why timeout calculations are buggy in the first case even when the extra bit is added, is because absolute comparisons are made instead of relative calculations. In the second algorithm, when we compute millis()-gMyTimeout, we actually compute a relative value. For example, when gMyTimeout is initially a high value, e.g. 0xff01, then when millis()==0, the calculation will be 0-0xff01 which is 0xfe, i.e. 254 milliseconds later.

Now the interesting thing about that calculation is that we're no longer strictly using unsigned arithmetic here, because 0 is much less than 0xff01 so the result is a negative number whose lower 16-bits result in a small positive number. So, what you have is a maths error, which happens to deliver the right results. In some (often older) architectures it's possible to trap on overflow; and this is what would happen here, a timeout would lead to a system error.

Signed Timeouts Are Sweet Timeouts

The observation that timeouts are relative leads neatly to this insight: timeouts should be signed, as should the jiffy clock.

It's counter-intuitive, but simply by defining, say the jiffy clock as:

int16_t gJiffy;

Incrementing it in the timer tick routine (let's assume it's 1KHz):

ISR(TimerTick) {
    // ClearTimerTickInterruptFlag

Let's say millis() is a int16_t:

int16_t millis() AtomicRead(gJiffy)

Defining a timeout as:

int16_t gMyTimeout=millis()+aPeriod;

And then to test for a timeout with:

if(millis()-gMyTimeout<0) {
    ..// We've timed out.

Solves every problem we've found with timeouts so far; does it neatly and cleanly; and maximises the timeout period, in this case to n-1 bits, i.e. a range of 32s when the whole jiffy clock period will be 64s.

Math Madness

Why is it that the timeout calculation is expressed as millis()-gMyTimeout<0 instead of millis()<gMyTimeout when surely they're the same thing? In basic algebra we're taught that:

a<b <=> a-b<0

You simply subtract (or add) b at both sides to derive the equivalent equation. The answer is that it's only true on an infinite number line: for modulo arithmetic, both unsigned and signed, there are cases where they are not equivalent.

a b a-b signed(a<b) a-b<0 unsigned(a<b)
0x00000x0001 0xffff True True True
0xfffe0x0001 0xfffd True True False
0x80000x7fff 0x0001 True False False
0x7fff0x8000 0xffff False True True
0x00010xfffe 0x0003 False False True
0x00010x0000 0xffff False False False

This is because when you take two quantities in the range 0..x and subtract one from the other, the total range of values you can get on an infinite number line is -x to x; and therefore if you can only represent x different values, then the same values must be re-used. This explains why, when you use unsigned values for timeouts, you get a large polling window if a is very low and b is really high, but as a gets higher, the polling window becomes increasingly squeezed.

It also explains why if we want to maximise the efficiency for timeouts, we should maximise (and standardise) the polling interval, and the way to do that is to limit the timeout range to maxTimeout=x/2, and then the range of results will be -x/2 to x/2.

The Intuitive Angle

The upshot is this: using signed timeouts is the cleanest solution out. It works, because when we subtract the current time from the timeout, we obtain the timeout relative to now: a positive number if it's in the future (it hasn't timed out) or a negative number if it's in the past (it has timed out). The window is the maximum theoretically possible, because given two arbitrary times, the only way to tell if a value has timed out is if you guarantee your timeouts and polling period are within half the maximum range, 180ยบ of your data type.


There are many ways in which software engineers perform timeouts, but only signed timeouts are clean and efficient in storage, computation and code size. We should use them as standard.

Saturday, 18 March 2017

New Record Low Arctic SIE Maximum Reached

The Arctic Sea Ice is dying. We've known this since the mid-1990s from satellite measurements available since the end of the 1970s and there is some pre-satellite evidence to show that it has been in decline since the 1960s via the early environmentalist Rachel Carson.

However, since the mid-2000s it's been accelerating. Normally the big news has been with the Sea Ice Extent minimum reached in September, but recently the decline in the Sea Ice Extent maximum in March is becoming increasingly concerning.

This year we have reached a new record low Arctic SIE Maximum, about 40,000Km2 lower than the previous maximum reached in 2015. This is after 6 months in 2016 where the Arctic SIE was at record low levels and even this year it has spent about 30% of the time in record low territory over and above (or should it be under and below) the record lows over that period in 2016.

The record itself was reached near the beginning of March (March 06), but because the extent can vary quite significantly up and down at the maximum point, it's not safe to call the maximum until it can be reasonably known that it's peak will be exceeded.

That point has been reached, the current extent reached 13.61mKm2 as of March 16 and there is no year from the year 2000 to 2016 where SIE has risen by more than the 270,000Km2 that would be required for 2017 to break its current peak.

Here's the graphic.

Wednesday, 8 March 2017

uxForth: Unexpanded forth for a standard VIC-20. Part 3, the memory map

I'm the developer of the DIY 8-bit computer FIGnition, but it doesn't mean I'm not interested in other retro computers and the idea of developing a minimal Forth for the ancient, but cute Commodore VIC-20 is irresistable!

Part 1 talks about the appeal of the VIC-20 and what a squeeze it will be to fit Forth into it's meagre RAM.

In Part 2 I discussed choices for the inner interpreter and found out that a token Forth model could be both compact and about as fast as DTC.

Now I'm going to allocate the various parts of the Forth system to VIC-20 memory to make the best of what's there. Some of it will be fairly conventional and some somewhat unorthodox.

(An Aside, The Slow uxForth Development Process)

From the presentation of the blog entries it looks like I'm working these things out as I'm going along. For example, it's worthwhile asking why it looks like I can leap to fairly concrete decisions about the inner interpreter or even that I think I'll be able to fit the entire system into the available space.

The simple answer is that I've already done much of the work to make this possible. I've already written the code that implements the primitives (in fact I've written, modified and rewritten it a few times as I've improved it). I've made use of the wonderful resources at 6502 org, particularly the idea of splitting the instruction pointer (called gIp in my implementation) into a page offset and using the Y register to hold the byte offset: it really does improve the performance of the core Next function.

Similarly, I've written the non-primitive code and accounted for the space. It's written in Forth with a home-brew meta-forth compiler written in 'C'. So, there will be a future blog on that too!

However, it's not a cheat as such. The code is not tested yet; nor even loaded into a real VIC-20 nor emulator (I don't have a real VIC-20 :-( ). I have real decisions to make as the blog continues, which means I can make real mistakes too and have to correct them. What I've done, really, is basically a feasibility study, so that you don't waste your time reading the blog. And of course, the whole of uxForth will be released publicly, on a GPL licence via my GitHub account.

Admittedly, it's being released slowly, a 2.75Kb program I hope to release over the course of 2017!

The Memory Map

Page 0

Page 0 is the gold dust of every 6502 system: versatile and in short supply. BASIC uses the first 0x90 bytes and the KERNAL uses the rest. We'll use all 0x90 bytes for the data stack and some key system variables:

Addr Size Name Comment
$00 2 gIp Instruction pointer, lower byte always 0.
$02 1 gTmpLo Temporary byte
$03 1 gTmpHi Temporary byte used for indirect access.
$04 2 gILimit The limit for the inner-most do.. loop. uxForth (and FIGnition Forth) differ from most Forths in that the inner most loops values, the limit and the current value are held in global locations. do causes the previous gILimit and gCurrent to be pushed to the stack; thus r is equivalent to j on other forths.
$06 2 gICount The current loop count for the inner-most do.. loop.
$08 1 gUpState The current compilation state.
$09 1 gUpBase The current number base
$0a 2 gUpDp The current dictionary pointer.
$0c 2 gUpLast A pointer to the header of the most recent dictionary entry compiled
$0e 2 gUpTib The pointer to the input buffer (I'm not sure if we need this)
$10 128 gDs The data stack
$fb 2 gTmpPtr0 Spare pointer 0
$fd 2 gTmpPtr1 Spare pointer 1

Page 1

Page 1 is the return stack as you might expect. Oddly enough, we only get 192b, because the KERNAL uses $100 to $13F.

Page 2

There are 89 bytes available here, because they're used by BASIC. I plan to use them for the byte code vectors which are:

# Name # Name # Name # Name
$00 (nop) $0b (+loop) $16 u/ $21 rp!
$01 ;s $0c 0< $17 @ $22 drop
$02 exec $0d 0= $18 c@ $23 dup
$03 (native) $0e + $19 ! $24 over
$04 (lit8) $0f neg $1a c! $25 swap
$05 (lit16) $10 and $1b r> $26 (vardoes)
$06 0 $11 or $1c >r $27 (constdoes)
$07 (0branch) $12 xor $1d r $28 inkey
$08 (branch) $13 >> $1e sp@ $29 emit
$09 (do) $14 << $1f sp! $2a at
$0a (loop) $15 * $20 rp@ $2b

The codes that are greyed out have no names in the dictionary to save space; the way you'd insert them into code would be with [ nn c, ] sequences.

Page 3 and Page 4

There are a total of 116 bytes free from $2A0 to $313, I'll fill that area with some of the actual native definitions.

The cassette buffer is at $33c to $3fb. We'll be using the cassette for storage so we can't use it for code. 

Pages 16 to 31 ish ($1000 to $1dff)

This is the area of RAM reserved for BASIC. It will contain the rest of the Forth system.

The screen RAM ($1e00 to $1ff9)

The end of RAM for an unexpanded VIC-20 is used for the screen. The plan here is to use that area for the editing space.  Instead of implementing a line editor (ACCEPT in FIG-forth and early FIGnition Forth), we use key to call the KERNAL editor and allow it to manage the editing of a line including cursor movement. Pressing Return doesn't execute the command line, instead, pressing F1 exits the editor and sets the interpretation point to the current cursor position. The end of the interpretation point is set to the end of the screen and emit is turned off until interpretation gets to the end of the screen. Importantly, pressing return doesn't start interpretation.

In addition, pressing F2 saves the screen bytes onto cassette.

This is how I'll implement storage in a fairly minimal way. By implementing save via F2 I can save a block (actually the 506 screen bytes are roughly half a traditional block), but LOAD is a normal word, so multiple blocks can be loaded (you just add load to the end of the block).

So, this is how you'd do normal editing operations. For normal words you would place the cursor near the end of the screen and edit to the end of the screen; cursor to return to the first character you want to interpret and then press F1. In a sense this is easy, because you can just press Return and then cursor up until you get there. The same method would also work if you wanted to compile a whole screen's worth of code. Load itself would reset the cursor position to [home] and then return to the interpreter, so placing a load at the end of the screen would load the next screen without any recursion. That way you'd be able to develop programs that were longer than just one screen without manual reloading.


In the memory allocation of uxForth, we've squirrelled away about 1053 bytes of RAM, embedding the line buffer in the screen and a number of system variables in page 0. We've also included 212 bytes of what we'd use for the program proper. It won't get much better than this!

In the next post I hope to talk in more detail about the implementation of the primitive words and the code used to test them.

Sunday, 12 February 2017

Arctic Brain Freeze

Another Geo-engineering project is doing the rounds.

In my opinion, it's not such a good idea, so sorry to dump on the scientist. There's grave concern for the arctic, I track it via the Jaxa arctic sea-ice extent vishop web page on a daily basis. It's scary.

Steven Desch's plan is still a bit like pumping harder on the gas to provide the power to apply some brakes, because the 1 million wind pumps would have to be currently made with fossil fuel industry power. I imagine, that because they're not generating electricity, just pumping cold, salty water up; they won't deplete the rare earth metals used to make electric wind-turbines. So, it's not as bad as it might be.

But, 1 million wind turbines: for electricity turbines that's enough to power 2 billion homes at a cost of £2000bn (though economy of scale would bring that down).

It's disappointing that he says: "“Our only strategy at present seems to be to tell people to stop burning fossil fuels.. It’s a good idea..." What he should be saying is "that is of course essential." Because it's not just a good idea, we have to get our emissions down to zero (and rapidly, and then go negative!), because CO2 will hang in the atmosphere for 100s to 1000s of years. Prioritising CO2 over arctic geoengineering is wiser, because it'll provide centuries of better breathing space (quite literally); whereas this, like all geoengineering projects encourages BAU and creates a maintenance issue: you're going to have to replace 5% of them every year: that's 50,000 wind pumps per year.

In addition, because global warming would get worse, the effort to keep the arctic frozen would increase. That's because of warm waters entering from the Atlantic and Pacific as well as warm air from the rest of the planet, pressures on the jetstream and higher radiative forcing due to greater CO2 in the atmosphere. These things wouldn't change and all of them are a product of burning FF.

The question though is that prioritising this would probably deplete renewable energy efforts, because the companies used to build turbine blades would first be diverted to build these things.

The definitive lay perspective on Geo-engineering, I think is the chapter on it in Naomi Klein's book "This Changes Everything."

Sunday, 6 November 2016

uxForth: Unexpanded forth for a standard VIC-20. Part 2, the inner interpreter.

I'm the developer of the DIY 8-bit computer FIGnition, but it doesn't mean I'm not interested in other retro computers and the idea of developing a minimal Forth for the ancient, but cute Commodore VIC-20 is irresistable!

In the first part I talked about the appeal of the VIC-20 and how much usable RAM I thought I could squeeze out of it.

That turned out to be between 3947 bytes and 4646 bytes depending on whether we count the screen and the CPU stack. And this sounded more credible, except that I want at least 1Kb of RAM for user programs which brings me back to 2923 to 3622 bytes. A terrible squeeze after all.

There's one obvious way to tackle that: use the Token Forth model. A definitive articles covering all the trade-offs with developing Forth are in the series "Moving Forth" by Brad Rodriguez, but here, we just need to recap on the most popular Forth models.

Forth Execution Models

Forth normally represents its programs as lists of addresses which eventually point to machine code. The mechanism is handled by the inner Forth interpreter called "Next". The traditional Forth model implements what's called Indirect Threaded Code.

Here, each forth command (in blue) points to an indirect address (in green) which points to some machine code (in pink). Primitive commands in Forth (like DUP, >R, SWAP and R> here) have an indirect address which points to the next RAM location where the machine code starts. But commands written in Forth itself (like ROT) start with an indirect address which points to ENTER which implements a function call and is then followed by more Forth commands (in blue). A Forth command like this then ends in EXIT, which returns Forth execution to the next calling function (MYFUNC).

The next Forth model to consider is Direct Threaded Code. Here's the same thing:

Here, every forth command (in blue) points directly to machine code (in pink). Primitive commands are executed directly, but commands written in Forth itself (like ROT) start with a "JSR Enter" machine code instruction which saves the return address (to Forth code) on the normal stack and in the DTC Forth, this return address is used as the new Forth Instruction Pointer after pushing the old IP. We can see that DTC will normally be faster than ITC because there's less indirection.

Token Threaded Forth is essentially a byte-coded Forth, except that in the case of commands written in Forth itself, the NEXT routine uses the top bit of the token to denote an address. Thus, only a maximum of 128 tokens can be supported and only 32Kb of Forth code.

In this example, we can see that the Forth code has been reduced from 14 bytes to 8b, but there is a jump table of addresses which is the same size as the indirect entries in ITC (10b used for these entries). DTC used an additional JSR (3 bytes) for the ':' defined word, but TTC didn't need any extra bytes for the ':' definition (it uses a single bit, encoded in the $93A0 address). Here, the overhead of ITC weighs in at 24 bytes, TTC weighs in at 18 bytes and DTC weighs in at 17 bytes.

We can see that TTC could significantly reduce the size of Forth code if the forth tokens are used often enough, but traditionally a byte-coded interpreter is slower than a threaded code interpreter. uxForth won't beat a DTC Forth, so the question is whether it can compete with an ITC Forth.

Execution Timings

ITC Forth:

NEXT      LDY #1
          LDA (IP),Y ;Fetch
          STA W+1    ;Indirect
          DEY        ;Addr
          LDA (IP),Y ;to
          STA W      ;W
          CLC        ;Inc
          LDA IP
          ADC #2
          STA IP     ;IP.lo
          BCC L54    ;If CY,
          INC IP+1   ;inc IP.hi
L54       JMP IndW
IndW:  .byte $6c ;JMP () opcode
W         .word 0

This is the implementation from the original 6502 FIG Forth. It uses zero-page for IP and W. The indirection is achieved by jumping to an indirect Jump.  It requires 41 cycles.

DTC Forth

NEXT      LDY #1
          LDA (IP),Y ;Fetch
          STA W+1    ;Indirect
          DEY        ;Addr
          LDA (IP),Y ;to
          STA W      ;W
          CLC        ;Inc
          LDA IP
          ADC #2
          STA IP     ;IP.lo
          BCC L54    ;If CY,
          INC IP+1   ;inc IP.hi
L54       JMP (W)
W         .word 0

This is a simple derivation from the original 6502 FIG Forth. As before it uses zero-page for IP and W. The indirection is achieved using a simple indirection.  It requires 36 cycles.


lda (gIp),Y ;byte code.
asla ;*2 for gVecs index
                        ;Also 'Enter' bit in Carry
iny ;inc gIp.lo
beq Next10 ;page inc?
;no page inc, fall-through.
bcc Enter ;Handle Enter.
sta Next7+4 ;modify following Jmp.
jmp (gVecs) ;exec byte code.
inc gIp+1 ;inc page
bcs Next7 ;now handle token/enter.

This is the proposed UxForth implementation. The UxForth version has to handle both multiplying the token by 2 to get the index for the jump table (gVecs) and testing to see if it's a call to another Forth routine (bcc Enter). It requires 22 cycles, so we can see that it's almost twice as fast as the ITC version. This is because it has one natural advantage and uses several techniques to improve the speed:

  • Y is used to hold the low byte of IP, thus when we execute lda (gIP),Y , only the upper byte of gIP is used, the lower byte is always 0.
  • Branches are arranged so that the common case is the fall-through case. Thus when IP increments over a page boundary two jumps are needed.
  • We normally only have to read one instruction byte instead of two. This is the one natural advantage TTC has over ITC or DTC.
  • The vector is stored directly in the code (the second byte of jmp (gVecs) ).
18b vs 26b for ITC Forth and 25b for DTC forth. It's possible to use most of these techniques to improve the speed of ITC and DTC 6502 Forth, but I'm not so concerned about that, because the easiest to access VIC-20 Forth is the Datatronic Forth (which is an 8Kb ROM cartridge) and Datatronic Forth uses exactly the same version of NEXT as FIG-Forth.


RAM is still very tight, but we can reduce its usage by implementing a byte-coded Forth and we should find it's perhaps up to twice as fast as a traditional FIG-FORTH implementation.

In the next post we'll look at how we might map our Forth implementation to the available RAM regions!

Thursday, 3 November 2016

uxForth: Unexpanded Forth For A Standard VIC-20

I'm the developer of the DIY 8-bit computer FIGnition, but it doesn't mean I'm not interested in other retro computers and as far as it goes, the Commodore VIC-20 is one of the cutest to come out of the 1980 stables.

The VIC-20 was cute, because it had a combination of fun and dumb features. Like: a full quality 65 key keyboard - and only two cursor keys!

Or the ability to support business applications with a floppy disk drive, but only having 23 column text. Or multi-colour graphics (and even a 2-bit per pixel mode that can co-exist with a 1-bit per pixel mode) with a near complete lack of support for bitmapped graphics. Or it's 16Kb of ROM and only 5Kb of RAM (with just 3582bytes free when Basic boots).

So, the fun challenge here is to see how much of a Forth I can squeeze into the Basic, unexpanded VIC-20 given the RAM limitations. I'm pretty confident I can do this, given that a super-tiny Forth subset has been crammed into just 1Kb of 8086 code (itsy Forth). I'm aiming for something that's kinda more usable.

Dude, Where's My RAM?

The first step (and this is the topic of this blog) is to find out how much RAM we can really use. A VIC-20 boots up and proudly displays: 

But it actually has 5Kb, so where has that other 1.5Kb gone? Armed with a detailed VIC-20 memory map we can see that areas of the first 1Kb have been nicked by Basic and the Kernal, which is a set of OS services abstracted from Basic and forms part of the ROM. For our purposes we don't want to use Basic, but we do want to use the Kernal, so we can read the keyboard, display to the screen and input/output between peripherals. For some of this 1Kb it's obvious which is used by Basic, but not all. So, here I decided to use the VIC-20 ROM disassembly. I first worked out that the Kernal starts at the address $E475, or thereabouts by observing that the rest of that code doesn't reference Basic. So, then I looked up all the system variables used by that section of code and found this set of addresses:

01,X 0100+1,X 0200,X 0259,Y 0263,Y 026D,Y 0277-1,X 0277
0277+1,X 0281 0282 0283 0284 0285 0286 0287
0288 0289 028A 028B 028C 028D 028E 028F
0290 0291 0292 0293 0293,Y 0294 0297 0298
0299 029A 029B 029C 029D 029E 029F 02A0
0300,X 0314 0314,Y 0315
(EABF) 0314 IRQ vector
(FED2) 0316 BRK vector
(FEAD) 0318 NMI vector
(F40A) 031A open a logical file
(F34A) 031C close a specified logical file
(F2C7) 031E open channel for input
(F309) 0320 open channel for output
(F3F3) 0322 close input and output channels
(F20E) 0324 input character from channel
(F27A) 0326 output character to channel
(F770) 0328 scan stop key
(F1F5) 032A get character from keyboard queue
(F3EF) 032C close all channels and files
(FED2) 032E user function
(F549) 0330 load
(F685) 0332 save


I also searched the Kernal code to find references to addresses within the Basic part of the ROM and found none, which meant that Basic sits properly on top of the Kernal. So, this tells us what areas of RAM we can use and it's as follows:

Address Range Size Owner Available for Forth?
$000 .. $08F $090 BASIC Yes, it's Page 0 how could we avoid it :-) ?
$090 .. $0FA KERNAL No.
$0FB .. $0FE $004 Nothing Yes, not sure yet what for.
$0100..$013E KERNAL Unlikely (tape error log/correction buffer)
$013F..$01FF $0C1 CPU Yes, for the return stack
$0200..$0259 $05A BASIC Yes, for Forth code
$025A..$029F KERNAL No.
$02A0..$02FF $060 None Free, more Forth code.
$0300..$0313 $013 BASIC Yes.
$0314..$03FF KERNAL No.
$033C..$03FB KERNAL Cassette buffer, maybe, but limited usage.
$03FC..$03FF $004 Free Some Vars?
$1000..$1DFF $E00 BASIC Forth code
$1E00..$1FF9 $1FA VIC (Screen)
$1FFA..$1FFF $006 Free 6b, more vars?

This gives us a total of $F6B (3947) bytes, or 4453 bytes if we can use the screen, or 4140 (4646) bytes if we include the CPU stack, which of course we will.

In the next part we'll make some basic decisions about the uxForth model, this will help us decide how to use all these areas.