Saturday, May 2, 2009

Moving to better place

After trying a wordpress (as a member of Squeak Board, we now running a blog) i found that wordpress is much more convenient and having better functionality, especially for editing & organizing stuff. Good bye hotspot, wellcome computeradventures.wordpress.com

Tuesday, September 2, 2008

New name for CorruptVM

We had vote choosing more appropriate name for CorruptVM , and name Moebius wins.
The project registered on http://code.google.com/p/moebius-st/ and i put some wiki pages there.
Also there is a discussion group is on http://groups.google.com.ua/group/moebius-project-discussion

Feel free to join.

Sunday, May 25, 2008

Got first bootstrapped object memory of CorruptVM

Today, for the first time, i able to create an initial object memory of CorruptVM.

A bootstrap procedure includes creating instances of vtables and compiling methods in simulated memory block.

A bootstrapped memory contains 75 compiled methods and 17 classes.
The memory footprint is about 6K bytes.

It will be much larger, if compiled methods would contain code.
But simulator, instead of putting a real machine code in method instance, puts 4 byte index, which is an index in simulator's method table where it holding lambdas of each compiled method.

Native methods feature set is about complete.
This means that native method can be executed and should work as it was defined.

ST methods compiling still lacking support of block closures & global names lookup semantics. Currently i stub block literals with nils.

Monday, April 28, 2008

Revised view of native code generator

Earlier, in my previous posts about ASM-generator i made something which can be used in practice, but still lacks of fundamental point: the VM support.
A new idea was born: since everything can be compiled down to native code, why we need to use C at all? I started things from the scratch and came up with new design of system where everything will be dynamically compiled and residing in object memory.

Here is some features of new meta-circular VM:

- all functionality, what we know as 'VM' will become a part of object memory with own classes and methods.

- for starting an image you will need a tiny bootstrapper written in C which mainly serves as image loader and provides basic compatibility layer with few OS-specific functions required to bootstrap VM. For calling C functions from VM there will be FFI layer, which should be used by any parts of system which require external libraries or OS-specific functions.

- there is no 'primitives' as in another ST implementations. All methods are compiled down to native code. For writing a low-level code, there is a special methods in native format. As with ASM-generator, in native methods you operating with machine registers/values , not with real objects, which allows you to implement almost anything from scratch, without writing a single line of C code.

Watch for the updates at wiki page:
http://wiki.squeak.org/squeak/6041

Wednesday, February 20, 2008

Keeping this blog up to date

During last months i was enjoyed with hacking a Squeak VM.
A new project, with codename Hydra VM is a rewrite of smalltalk interpreter to support multiple instances of interpreter, running in parallel using separate native threads.
You can read more about it here: http://squeakvm.org/~sig/hydravm/devnotes.html


There are some bugs which currently make some annoyance. But in overall, a HydraVM is surprisingly stable!

Sunday, August 12, 2007

Multi-threaded message passing scheme

Many smalltalk implementations lack of effective native threading support. And in rising era of multi-core CPUs something must be done to fill this gap.

The following model based on message passing threading. Any language, which based on message-passing can use such model.

What we need to know about objects?
- objects is a state, stored at some memory location. To interact with objects we send them a messages. A semantics, how message sent is under total control of the VM. To make object answer messages, developer defines methods, which can access receiver's state and can use this state to evaluate return result. Methods can also send messages to other objects and use their evaluation result for own needs.

To make safe use of native threads we need to define following restrictions:
- for any unique object in system only single method can access it's contents at single point of time.

Lets examine the structure of typical method evaluation scheme in message-passing language implementation:
1. lookup method for given message and receiver
2. enter method
3. evaluate result
4. exit method, return result

A result evaluation can be divided in steps consisting of following categories:
- manipulate receiver's contents
- send messages to other objects

Our goal is to prevent running two or more methods which access object contents at same point of time.
To ensure, that no other thread can manipulate receiver's contents we must mark receiver object as busy and make it free when we done manipulating.
For this, we must add a field, named 'busy' for each object in system. Object considered busy when some method accessing it's state and free if no methods currently accessing it's state.
Make note, that when we sending message to other object from within current method, we don't need to keep receiver object marked as busy, since we stop manipulating it's contents and waiting for return from other method.

But 'busy' property introduces too much overhead:
- to enter 'manipulation step' thread must check if receiver object is currently marked as busy and wait when it becomes free, then mark it as busy, perform manipulation and mark as free at the end.
Since such steps is rather atomic and performed with high frequency in running system, a locking/unlocking mechanism introduces too much overhead.

To avoid this i propose to use message queuing approach:
- messages instead of sent directly, are enqueued.
- a worker thread polls queue and enters enqueued method context to evaluate result.
- every object in system, instead of having 'busy' property, holds an active context reference, which, if not nil, indicates that object already take part in other method evaluation.

A typical worker thread is running on the following scheme:

1. fetch message context from queue
2. activate context (start/continue evaluating a method)

- when sending message from active context do following:
  • lookup for method of receiver
  • create new method context, set its parent to current
  • check receiver's active context
  • if receiver's active context is nil, choose thread which will be responsible for evaluating a message, set new context owner to chosen thread, set object's context to new context(write barrier).
  • if receiver's active context is not nil, set new context owner to same owner of receiver's active context
  • enqueue context to queue of its owner
  • deactivate current context
- when returning from method
  • store return result in parent context
  • if receiver's active context same as current context, set it to nil (no write barrier)
  • destroy current context
  • enqueue parent context to its owner thread queue
3. goto step 1.

When thread queue is empty, thread is going to sleep, and wait for activation by different thread, or can be recycled.
For choosing new context owner thread a thread which creates new context can check if its own queue is non-empty (a more messages is pending to evaluate), and if so, pass ownership of new context to one of the sleeping threads or allocate new thread if its own queue grown too big.

Memory management/Garbage collector.

How GC must act in such environment?
A general principles is clear enough: collect garbage fast and effectively, without system wide locks.

Wednesday, July 18, 2007

Asm-generator continued...

I decided to integrate assembler generator with standard smalltalk parser/compiler, so devs can use common tools to submit asm-code.

To indicate that method source contains assembler source, just put <assembler> pragma in its source body. This activates an asm-parser, and if parser found errors, it shows them in same manner, as smalltalk parser - by inserting error message at corresponding place in edit buffer.

The one good reason, why i decided to integrate parser with system browser is for automation:
- when you submitting asm code, its parsed, compiled and ready for execution by registering it with Exupery. So, in most cases, you don't need to make any extra moves - just edit and submit.
- if method have inline calls to other methods, they automatically parsed (if was not parsed before) to include in compiled code.
- when you submitting new source for method, which was inlined in other methods, this automatically causes to recompile them using new code.

So, after successful submission, new versions of native methods automatically get used (if they already was).

I also decided to slightly modify syntax of asm-code.

Changed flow control syntax.

Now, instead of smalltalk-like flow control, which implies to use blocks, you have labels and jumps.

So, instead of writing something like:

ifZero: [ ... ]

you placing a label in code:

self label: #somelabel.

then from at other point in code you placing jumps on it:

self jump: #somelabel.

or

self jumpCarry: #somelabel.

Labels and jumps must be the outermost messages (can't be passed as arguments to other messages).
Writing something like:

a := b + (self label: #somelabel).


is not an error in smalltalk, but error in assembler.

Later i'll add support for calls on label
( self call: #somelabel )
so, you can write methods like:

....
self call: #somelabel
....
....
self return: result.
self label: #somelabel
result := 1.
self ret.

loading address of label in register ( a := #somelabel),
and indirect jumps/calls: self call: a

global labels:

method1

self globalLabel: #globalOne
self ret.

--
method2
^ self call: #globalOne

jump tables :

self jumpTable: #table1 with: #(
#label1
#label2
#label3
)

....

addr := (#table1 + index) loadDword.
self jump: addr.

or maybe like this:
self jump: (#table1 at: index)

global data storage :

self dataDWord: #mydwords size: 10.

a:= (#mydwords + 5) loadDWord.


accessing VM values:

a:= self VMAddr: #someName

or just:

a := #someName

or

self call: #someVmFunction

i didn't decided yet, mix VM global names with own global labels/data or not.
Maybe someone have some better ideas?