2010 ~ Gedare-Csphd

Sunday, 12 December 2010

RTEMS: Adding a new scheduler

Posted on 17:13 by Unknown

In this post I describe how to add a new scheduler implementation to the RTEMS open-source real-time operating system. The scheduler I use as an example is an earliest deadline first (EDF) scheduler which applies the EDF algorithm to schedule periodic tasks and a FIFO policy for non-periodic tasks. This post demonstrates how to use my modular scheduling framework to extend the scheduling capabilities of RTEMS.

Adding a new scheduler implementation using my framework involves modifying and adding files. Modified files add the configuration points for the new scheduler and define the internal scheduler data structures that are passed through the scheduling interface. New files implement the new scheduler algorithm. I have submitted these proposed changes as a feature request to the RTEMS project. With luck, the bugs are ironed out, although I ran into a couple of regressions while re-basing to the RTEMS CVS.

Modified files

The modified RTEMS files include:

cpukit/score/include/rtems/score/scheduler.h
cpukit/score/include/rtems/score/thread.h
cpukit/sapi/include/confdefs.h
cpukit/Makefile.am
cpukit/rtems/src/ratemonperiod.c

scheduler.h modifications
Three changes are necessary in the scheduler.h file.

First, add #define _Scheduler_EDF (2) where the _Scheduler_USER and _Scheduler_PRIORITY are defined. Extend the numbering linearly, ordering matters.

Second, add a Scheduler_edf_Per_thread structure that contains the per-thread metadata required by the new scheduler algorithm.

/**
 * Per-thread data related to the _Scheduler_EDF scheduling policy.
 */
typedef struct {
  /** Point back to this thread. */
  Thread_Control                   *this_thread;

  /** This field contains the thread's deadline information. */
  RBTree_Node                       deadline;

  /** 
   * This field points to the last node in the ready queue that has 
   * the same deadline (absolute) as this thread.
   */
  Chain_Node                       *last_duplicate;
} Scheduler_edf_Per_thread;

Third, add to the Ready_queues union a pointer to the structure(s) used to manage ready tasks.

/**
 * This is the structure used to manage the scheduler.
 */
struct Scheduler_Control_struct {
  /**
   *  This union contains the pointer to the data structure used to manage
   *  the ready set of tasks. The pointer varies based upon the type of
   *  ready queue required by the scheduler.
   */
  union {
    /**
     * This is the set of lists (an array of Chain_Control) for
     * priority scheduling.
     */
    Chain_Control                      *priority;

   /**
    * Structure containing the red-black tree, deadline-ordered, and 
    * fifo-ordered queues for EDF scheduling
    */
    struct Scheduler_edf_Ready_queue_Control  *edf;

  } Ready_queues;

  /** The jump table for scheduler-specific functions */
  Scheduler_Operations                  Operations;
};

The ready queue is typically allocated dynamically during scheduler initialization.

thread.h modifications
The only change necessary in the thread.h file is to add a pointer for the per-thread scheduling metadata within the Thread_Control structure by extending the union of pointers at the scheduler field:

/** This union holds per-thread data for the scheduler and ready queue. */
  union {
    Scheduler_priority_Per_thread      *priority;
    Scheduler_edf_Per_thread           *edf;
  } scheduler;

confdefs.h modifications
The changes to confdefs.h are a little more involved. My last post about the scheduling framework goes into detail about how I use confdefs.h to enable user-configurable scheduling. The following demonstrates how to add a scheduler to the existing confdefs.h scheduler configuration infrastructure.

Enable the scheduler to be built when a user selects CONFIGURE_SCHEDULER_ALL:

/* enable all RTEMS-provided schedulers */
#if defined(CONFIGURE_SCHEDULER_ALL)
  #define CONFIGURE_SCHEDULER_PRIORITY
  #define CONFIGURE_SCHEDULER_EDF
#endif

Where the priority scheduler is set to the default, add a check to see if the new scheduler is configured:

/* If no scheduler is specified, the priority scheduler is default. */
#if !defined(CONFIGURE_SCHEDULER_USER) && \
    !defined(CONFIGURE_SCHEDULER_PRIORITY) && \
    !defined(CONFIGURE_SCHEDULER_EDF)
  #define CONFIGURE_SCHEDULER_PRIORITY
  #define CONFIGURE_SCHEDULER_POLICY _Scheduler_PRIORITY
#endif

If the new scheduler is configured for use, set up its initialization routine for the scheduler entry table and estimate its memory usage:

/* 
 * For additional schedulers, add a check for the configured scheduler 
 * here. Copy the above block for the priority scheduler, and then replace
 * the initialization and policy macros and update the memory usage.
 */

/* Check for EDF scheduler */
#if defined(CONFIGURE_SCHEDULER_EDF)
  #include <rtems/score/scheduleredf.h>
  #define CONFIGURE_SCHEDULER_ENTRY_EDF { _Scheduler_edf_Initialize }
  #if !defined(CONFIGURE_SCHEDULER_POLICY)
    #define CONFIGURE_SCHEDULER_POLICY _Scheduler_EDF
  #endif

  /**
   * define the memory used by the edf scheduler.
   */
  #define CONFIGURE_MEMORY_SCHEDULER_EDF ( \
    _Configure_From_workspace( \
      ((1) * sizeof(Scheduler_edf_Ready_queue_Control)) ) \
  )
  #define CONFIGURE_MEMORY_PER_TASK_SCHEDULER_EDF ( \
    _Configure_From_workspace(sizeof(Scheduler_edf_Per_thread)) )
#else

  #define CONFIGURE_MEMORY_SCHEDULER_EDF ( 0 )

  #define CONFIGURE_MEMORY_PER_TASK_SCHEDULER_EDF ( 0 )

#endif

Memory use estimates are most likely whatever is allocated during scheduler initialization (scheduleredf.c) and the overhead of the per-thread structure (scheduleredfthreadschedulerallocate.c).

Add the new scheduler's entry routine (initialization) to the Scheduler Table at the position defined by the macro in scheduler.h (_Priority_EDF, in this case 2):

#ifdef CONFIGURE_INIT
  /* the table of available schedulers. */
  const Scheduler_Table_entry _Scheduler_Table[] = {
    #if defined(CONFIGURE_SCHEDULER_USER) && \
        defined(CONFIGURE_SCHEDULER_ENTRY_USER)
      CONFIGURE_SCHEDULER_ENTRY_USER,
    #else
      CONFIGURE_SCHEDULER_NULL,
    #endif
    #if defined(CONFIGURE_SCHEDULER_PRIORITY) && \
        defined(CONFIGURE_SCHEDULER_ENTRY_PRIORITY)
      CONFIGURE_SCHEDULER_ENTRY_PRIORITY,
    #else
      CONFIGURE_SCHEDULER_NULL,
    #endif
    #if defined(CONFIGURE_SCHEDULER_EDF) && \
        defined(CONFIGURE_SCHEDULER_ENTRY_EDF)
      CONFIGURE_SCHEDULER_ENTRY_EDF,
    #else
      CONFIGURE_SCHEDULER_NULL,
    #endif   
  };
#endif

Lastly, include the memory overhead for the new scheduler in the calculation for the scheduler's memory overhead.

/**
 * Define the memory overhead for the scheduler
 */
#define CONFIGURE_MEMORY_FOR_SCHEDULER ( \
    CONFIGURE_MEMORY_SCHEDULER_PRIORITY + \
    CONFIGURE_MEMORY_SCHEDULER_EDF \
  )

#define CONFIGURE_MEMORY_PER_TASK_FOR_SCHEDULER ( \
    CONFIGURE_MEMORY_PER_TASK_SCHEDULER_PRIORITY + \
    CONFIGURE_MEMORY_PER_TASK_SCHEDULER_EDF \
  )

Makefile.am modifications
The changes to the cpukit/score/Makefile.am file are for compiling new files. I think this is easiest to show as a diff:

diff -u -p -r1.89 Makefile.am
--- Makefile.am    24 Nov 2010 15:51:27 -0000    1.89
+++ Makefile.am    12 Dec 2010 18:05:23 -0000
@@ -26,7 +26,8 @@ include_rtems_score_HEADERS = include/rt
     include/rtems/score/interr.h include/rtems/score/isr.h \
     include/rtems/score/object.h include/rtems/score/percpu.h \
     include/rtems/score/priority.h include/rtems/score/prioritybitmap.h \
-    include/rtems/score/scheduler.h include/rtems/score/schedulerpriority.h \
+    include/rtems/score/scheduler.h include/rtems/score/scheduleredf.h \
+    include/rtems/score/schedulerpriority.h \
     include/rtems/score/stack.h include/rtems/score/states.h \
     include/rtems/score/sysstate.h include/rtems/score/thread.h \
     include/rtems/score/threadq.h include/rtems/score/threadsync.h \
@@ -55,7 +56,8 @@ include_rtems_score_HEADERS += inline/rt
     inline/rtems/score/coresem.inl inline/rtems/score/heap.inl \
     inline/rtems/score/isr.inl inline/rtems/score/object.inl \
     inline/rtems/score/priority.inl inline/rtems/score/prioritybitmap.inl \
-    inline/rtems/score/scheduler.inl inline/rtems/score/schedulerpriority.inl \
+    inline/rtems/score/scheduler.inl inline/rtems/score/scheduleredf.inl \
+    inline/rtems/score/schedulerpriority.inl \
     inline/rtems/score/stack.inl inline/rtems/score/states.inl \
     inline/rtems/score/sysstate.inl inline/rtems/score/thread.inl \
     inline/rtems/score/threadq.inl inline/rtems/score/tod.inl \
@@ -142,15 +144,25 @@ libscore_a_SOURCES += src/objectallocate
 ## SCHEDULER_C_FILES
 libscore_a_SOURCES += src/scheduler.c
 
+## SCHEDULEREDF_C_FILES
+libscore_a_SOURCES += src/scheduleredf.c \
+    src/scheduleredfblock.c \
+    src/scheduleredfthreadschedulerallocate.c \
+    src/scheduleredfthreadschedulerfree.c \
+    src/scheduleredfthreadschedulerupdate.c \
+    src/scheduleredfschedule.c \
+    src/scheduleredfunblock.c \
+    src/scheduleredfyield.c

ratemonperiod.c modifications
Unfortunately, the EDF scheduling algorithm requires some work to be done whenever a job is released. In particular, the absolute deadline of the released job needs to be updated. Since the RTEMS kernel does not have a notion of periodic and non-periodic tasks, I extended the Rate Monotonic Manager to handle EDF tasks. To support EDF tasks, all that is required is to call a small function that does some work when a job is released. Currently, this call-out is made when initiating a periodic timer and when a periodic timer fires with or without period overrun. The relevant code can be seen in my submission. I think this needs some additional cleaning up.

New files
The scheduler implementation comprises several new files. As shown in the Makefile.am modifications above, the new files are:

include/rtems/score/scheduleredf.h
inline/rtems/score/scheduleredf.inl
src/

scheduleredf.c
scheduleredfblock.c
scheduleredfthreadschedulerallocate.c
scheduleredfthreadschedulerfree.c
scheduleredfthreadschedulerupdate.c
scheduleredfschedule.c
scheduleredfunblock.c
scheduleredfyield.c

I will give a brief description of the contents of each of these files, to give a sense of the organization of a scheduler implementation. It is arbitrary what files are provided, as long as the scheduler implementation provides an initialization routine and a Scheduler_Operations table. I tried to capture the requirements of the existing scheduler and the EDF scheduler in designing the modular scheduler, and hopefully it will support additional future schedulers without too much trouble. For details, consult the source.

scheduleredf.h
The header file for a scheduler implementation will contain the function declarations for at least the functions that are installed as pointers in the Scheduler_Operations table. Additional internal structures and functions may be defined here.

scheduleredf.inl
The inline routines for a scheduler implementation will contain most of the ready queue manipulations and the function bodies for the functions that are reached through the Scheduler_Operations fields. My schedulers rely on inlining to maintain a minimal call depth, since function calls are costly on some platforms. Most of these functions are used only once or twice so that code size is not a concern (this argument may need to be revisited for the EDF code).

scheduleredf.c
Initialization of Scheduler_Operations for EDF scheduling, and also allocating and initializing the internal structures (ready queue).

scheduleredfblock.c
Called when a thread suspends. Removes the thread from scheduling and updates queues.

scheduleredfthreadschedulerallocate.c
Called when a thread is created. Allocates the per-thread scheduling data for EDF scheduling.

scheduleredfthreadschedulerfree.c
Called when a thread is destroyed. Frees the per-thread scheduling data.

scheduleredfthreadschedulerupdate.c
Called when a thread's per-thread scheduling data should be updated. Currently this is only when a priority changes.

scheduleredfschedule.c
Called when a scheduling decision is wanted, sets the heir thread according to the scheduling policy (either the earliest deadline or the oldest non-periodic task).

scheduleredfunblock.c
Called when a thread is released or resumed. Adds the thread to scheduling queues.

scheduleredfyield.c
Called when a thread is willing to yield the processor. In the EDF scheduling, periodic tasks are not allowed to yield (i.e. this is a nop).

Posted in hacking, RTEMS | No comments

Saturday, 4 December 2010

Architectural simulators

Posted on 20:37 by Unknown

As a researcher in the areas of operating systems, compilers, and computer architecture, I spend a lot of time dealing with simulators. An inordinate amount of time. Much of this time is spent figuring out how well a given simulator provides:

Useful timing measurements (cycle-accurate)
Support for fully functional OSs (full-system)
Auxiliary features for specific experiments
Support for useful hardware platforms and instruction sets
Openness to modifying microarchitectural features
Availability of technical support

In my current work, I'm interested in actively developed simulators that support cycle-accurate full-system simulation with a detailed memory hierarchy on out-of-order uniprocessor or multicore RISC architecture models with the SPARC, MIPS, or ARM instruction sets that supports extending the pipeline and instruction set. The rest of this article discussed my efforts in finding simulators to meet my needs.

A brief aside: I received a complaint that I write too much, so I have taken the effort to highlight in blue the important points I want to make; coloring is less work than thinking about how to be more concise. If you are familiar with architectural simulators, this should speed up processing my drivel. I go in to details about the 6 features listed above, then review some architectural simulators and processor emulators with respect to the above criteria.

The first two features, cycle-accuracy and full-system simulation, tend to be unavailable simultaneously. Cycle accurate simulation improves on simulators that provide only instruction set simulation by adding detailed timing (delay) and modelling of the microarchitectural features of a processor. Cycle accuracy has been available in open academic simulators for a number of years, with the dominant player being SimpleScalar. However, SimpleScalar fell behind the fast-moving industry, so that its usefulness is limited in modern research although it still is an approachable system for students or for embedded systems. Full-system simulation models the low-level hardware necessary to support running an OS and applications without modification. The defining features of a full-system simulator include support for interrupts, memory management hardware (MMU / TLB), low-level buses/interconnects, peripherals (e.g. keyboard, mouse), IO devices including disk and networking, and other functionally relevant system components.

The third feature that I investigate comes in the additional architectural modelling that is provided for supporting research. Perhaps the two most important "auxiliary" features of a simulator are a detailed memory hierarchy and an accurate power model. A detailed memory hierarchy exposes the timing and functionality of elements along the path from the CPU to main memory. Among other things, this includes the cache parameters (line size, set size, associativity, access latency), cache behavior or policy, memory access latencies on cache misses, and, more recently, details of the interconnect between caches and memories. A detailed memory hierarchy is necessary for cycle accurate and functional simulation, although the details required by both may vary. Accurate power estimation involves modelling the power demand of an architecture by estimating the energy dissipated by accessing architectural features (dynamic power) and the power loss due to leaking energy regardless of changes in hardware state (static/leakage power). Approaches to power estimation are derived from efforts to measure the power of real platforms using repeated executions of instructions to construct estimates of the power of accessing architectural features. Estimates are validated by running complex workloads and measuring the observed power of a real platform and comparing it with the predicted power of the simulator. A common tool for the microarchitecture power research in academia is WATTCH, which has been integrated in a number of simulators. CACTI is commonly used for estimating the power dissipated by caches. It is also important to account for memory power, especially when proposing solutions that trade-off performance and power such as dynamic voltage and frequency scaling (DVFS); current tools fall short on accounting for off-chip memory costs.

The supported hardware platforms and instruction sets of a given simulator also interacts with the ability to modify its low-level features. Some modern simulators target the x86(_64) architecture, but since actual x86 implementations do not directly implement x86 instructions it is difficult to model x86 accurately. (Although PTLsim does claim to capture the cycle overheads of all elements of actual x86 implementations.) Instead, the hardware translates at run-time the machine code into one or more RISC-like instructions called micro-ops (μops). This indirection makes it difficult to do low-level compiler work, and pretty much impossible to build an architecture that can trap x86 instructions at the assembly language level. It is also difficult to create realistic prototypes, since the actual implementation details are obscured and proprietary. I mainly look for support of superscalar out-of-order (OoO) RISC architectures, which are useful for architectural and language research due to the straightforward mapping from instructions to architecture pipelines while still having complexity in the pipeline. One caveat to that last statement is that many academics, and even some in industry, have stated the the future of CMP is in simpler cores; so CMP researchers may seek less sophisticated processor pipelines. Also of importance when considering what simulator to use is the set of architectures modelled, for example OoO versus in-order, superscalar vs VLIW, CMP vs SMP vs SMT vs uni, and so on.

The last two items on my list tend to be opposed to each other: open-source simulators tend to be supported by small communities of academic (graduate students) users, whereas commercially supported simulators tend to not allow much tinkering with the microarchitecture (since the source code is not open, and the simulator implementation is proprietary). It is critical that the microarchitecture simulated be open or extensible to enable research that modifies pipeline elements, adds new features, changes dataflow and control paths, etc. All the simulators of which I'm aware that provide enough flexibility for architectural research do so by providing the source code for modification, thus precluding proprietary/commercial simulators. There have been efforts to provide a plug-in framework for microarchitecture research, although I'm not aware of any current commercial simulators that support such plug-ins.

Simulators
The following notes cover some simulators that I have used or looked into using. The UW architecture group has a page of links that covers many of the available architecture simulators and related tools.

SimpleScalar

An easy-to-modify, academically licensed source-available cycle-accurate simulator that lacks full-system capabilities and only supports uniprocessor architectures including the Alpha, MIPS, and ARM instruction sets. This simulator is no longer actively maintained or supported.

SESC

An academic project out of UIUC, SESC is a cycle-accurate superscalar OoO simulator that supports CMP (multicore) platforms. The only instruction set supported is MIPS, and there is no support for full-system simulation.

Simics

Originally an academic project, Simics was commercialized by Virtutech and sold to Wind River, an Intel subsidiary. Virtutech, and now Wind River, provide academic licensing for Simics, which provides a limited set of the full Simics suite of processor models. The most detailed models in the academic suite are the SPARC models, followed by the x86. Simics provides full-system functional simulation with multicore platforms that executes an in-order model with 1 cycle per instruction. The microarchitectural interface (MAI) provided a plug-in framework for researchers to observe instructions and hook timing functionality to simulate varying feature latency; however, MAI is no longer supported by versions of Simics past 4.0. Simics also allows for user decoders to be defined that can re-define the functional behavior of instructions. User decoders support ISA extensions and tweaks while still maintaining functional fidelity.

GEMS

Multifacet GEMS (more commonly just Simics GEMS) is a project from UW-Madison that provides modules for Simics without using the MAI. GEMS implements an OoO processor (Alpha) model for the SPARC-V9 instruction set with a detailed CMP memory network (specifically, the UltraSPARC III+ instruction set). GEMS is composed of two primary modules, Opal and Ruby. Opal is the OoO cycle-accurate simulator that relies on Simics for some of the difficult full-system features; when Opal does something that is detected as functionally incorrect, it squashes its work and reads architected state from Simics. Ruby is a complex memory subsystem intended for easing research in the protocols and interconnects of CMPs. Opal is designed to call into Ruby for its detailed memory hierarchy, although Opal also has a simple two-level memory hierarchy that is usable in uniprocessor mode. Most work uses Opal+Ruby, and some researchers only use Ruby (since they are only interested in the memory subsystem), and there are even efforts to port Ruby to other simulators to provide the detailed memory hierarchy.

PTLsim

PTLsim is an open-source cycle-accurate x86 simulator. The base PTLsim does not provide full-system simulation nor does it model multicore, although full-system features are provided through Xen and more recently KVM/Qemu. MPTLsim was also presented at DAC, but I have not seen any mention of its implementation. It is an interesting project though, especially if x86 is a compelling architecture for a particular research problem. I haven't personally tried to use PTLsim, although our group did task an undergrad to give it a whirl -- he had difficulty with building and running it, although this was for PTLsim/X. The newer version relying on KVM might be better supported.

I have previously talked about M5 on this blog. It is another academic simulator, originating at Michigan, that provides a combination of full-system (FS) or processor emulation (SE), cycle-accuracy, and architectural models, although at present only the ALPHA instruction set is supported in FS mode with a cycle-accurate (OoO) model. I believe the origin of M5 was to study networking, so the memory hierarchy is not particularly robust. There is an effort called GEM5 (clever!) to port the Ruby memory hierarchy to M5. Community support for M5 is decent, although best-effort.

Emulators
Loosely related to architectural simulators are processor emulators and hypervisors. I looked at using two of these, but they do not model the architecture in detail enough to support easy microarchitectural research or cycle-accurate timing.

Bochs

A full-system emulator for some of the x86 and x86_64 ISAs that is based on binary translation. When I looked into Bochs, it did not support multicore or SMP, although that may have changed as it is still an active project. I also read, but never verified, that the emulation in Bochs is very time-consuming.

Qemu

Another binary-translating proceessor emulator, QEMU supports a broader range of architectures and is fairly efficient. I actually do use it for rapid prototyping in some work that I do, but only for application and kernel development, not for architectural research. It is an active project.

Well, that wraps up my view on the current architectural simulators. If you have any experiences, differing opinions, or other simulators that have compelling features please feel free to drop me a line.

Posted in computer architecture, work | No comments

Monday, 22 November 2010

Interrupt Handling with MMU Hardware on an MMU-less OS

Posted on 16:11 by Unknown

This post describes some work that I have just recently gotten to a point that it seems to be correct. It was a long-running project that started around May 2010.

The way I provided sparc64 RTEMS with interrupt support is to register ISR handlers in the SPARC trap table. In sun4v/niagara, I use the open firmware (OpenSparc's OpenBoot) trap tables and can modify the entries directly during simulation. However, the sun4u/usiii firmware trap table appears to be "locked" -- when RTEMS attempts to write to the trap table, it gets an MMU data protection error. I do not think it is possible to change the protection bit, at least I haven't seen any way to do it, so the sun4u requires a different solution. It would also be more appropriate to implement a general solution for the sun4v, since modifying the firmware trap tables is probably not a usable solution in many cases.

RTEMS is a real-time operating system that lacks support for Virtual memory, which is sensible for embedded devices. However, the SPARC V9 (sparc64) architecture assumes a virtual memory system and includes the appropriate MMU hardware. To further complicate things, the more recent variants of the sparc64 prevent system software from disabling the MMU easily, because the hardware is virtualized. At any rate, dealing with the MMU can be troublesome, and is made even more complicated when the OS does not expect or support the MMU hardware.

From a high level view, I had two general thoughts for supporting sparc64 interrupts:

Disable the MMUs
Install a new trap table.

The problem with (1) disabling the MMUs is that the RTEMS application is not resident in physical memory. The hardware does demand paging to load the application data and instructions when they are used. Early attempts at turning off the instruction MMU caused the next instruction to be not found in physical memory and the CPU to crash.

The problem with (2) installing a new trap table is that when the processor switches to the other trap table, it is not present in the I-TLB. So the first trap to hit will cause an I-TLB miss, and the I-TLB miss handler, also being part of the new trap table, will also miss, which will lead the processor to crash. This is something of a chicken-and-egg problem.

After some failed attempts to resolve this problem, I decided to bite the bullet and integrate some MMU handling code with the sparc64 RTEMS. Since I used some of the HelenOS architecture-specific code in the development of the original RTEMS niagara board specific package, I went back to that code-base to see if I could get some useful MMU support code.

I managed to mangle the HelenOS bootloader and kernel startup routines so that only the trap table install and MMU take-over were used. The next step was to get an appropriate trap table installed. After some wasted effort trying to use the HelenOS trap table, I realized I could do something a little cheaper: copy the Open Firmware trap table, install a valid TLB mapping for it, and then switch trap tables.

Success! Sort of.

The MMU trap handler copied from the firmware attempts a branch-always (PC relative) to a region of code that exists in the firmware but is not part of the trap table and therefore is not copied to the relative address of the branch target. I patched around this problem by copying extra memory from the firmware; an experimentally derived value of doubling the copied space (from 32K to 64K) works on the Simics simulator. This solution is inelegant, hackish, and brittle, but it worked so I let it be for awhile.

I was recently able to revisit this work and implement a better fix. Now a
trap table is generated that will by default jump to an address+offset that
represents the firmware trap table. Basically, each entry in the new trap table points to the same entry in the firmware trap table. So when a trap is caught, it will go to the new trap table, which will then jump to the correct trap entry in the firmware's trap table. RTEMS can overwrite this default behavior by installing new trap handlers, thus preventing the jump to the original trap table.

RTEMS is able to install user ISRs to the new trap table as expected, and any non-overridden trap handler will vector to the firmware trap table. This means that the firmware trap table is no longer copied but can still be used, while new traps can be installed by RTEMS in order to support user-defined interrupt handlers.

My current solution still requires the MMU support for installing the new trap table's TLB mapping, but it seems to be reliable so far and has improved the robustness of sun4u test cases. Installing new trap entries for all of the common-case traps would be a good improvement from here, since it would reduce the reliance of the port on the correctness of the firmware code. It also might be worthwhile to use the same mechanism on the niagara port. For now, though, I am satisfied with my solution.

Posted in hacking, RTEMS, work | No comments

Sunday, 14 November 2010

RTEMS Modular Task Scheduler

Posted on 15:19 by Unknown

As I mentioned in my last post, this past summer I participated in the Google Summer of Code by working on the RTEMS project. I have hopefully ironed out the last few small details that prevented the basic infrastructure that I developed from being accepted into the RTEMS code base. In this post I describe the design of the new scheduler, and also comment on some of the problems that I faced when merging the work. At the end of the post is a list of remaining work for this project.

Pre-existing RTEMS Scheduler

New Modular Scheduler
Scheduler Control Structure
Ready Queue Encapsulation
Per-Thread Metadata
Scheduler Operations Table
Configuring the Scheduler

Merging the New Scheduler
Short-circuit Evaluation
PowerPC overheads

Open Issues
Improper Encapsulation
Interrupt Service Routine (ISR) Latency
Documentation
Configuration
New Technologies

Pre-existing RTEMS Scheduler

A structure of chains called the ready chains is used by the scheduler to manage the set of ready tasks and assign the highest priority task to the CPU. The ready queue is implemented as a 256-element array of FIFO lists. Every element (list) of the array represents a priority level. The zero level designates the highest priority. Tasks with equivalent priorities are placed on the same FIFO list associated with their priority level. The highest priority task is the first one in the first non-empty priority-level FIFO list from the beginning of the ready queue.

Finding the highest priority task would be time-consuming using linear search for the first non-empty FIFO list. Therefore, a 256-bit bitfield is maintained in which every non-empty FIFO list has a set bit corresponding to its priority level. The search is a two level lookup with a scan of only 16 bits at a time required. Two scans are performed. The first scan is the "major" portion of the priority (e.g. upper nibble) and the second scan scans the 16-bits associates with that range of priorities to determine the "minor" portion (e.g. lower nibble). This bitfield scanning is handled by low-level cpu dependent instructions or highly optimized portable C when special instructions are not available. Thus the first non-empty list can be found efficiently and in a fixed length of execution time. Note that the executing task remains in the ready queue. A task is removed from the ready queue only when it terminates or becomes blocked.

Manipulation of the ready chains (array of fifos) happens throughout the thread management code. Additionally, functions in the thread manager access the ready chains through the thread control block (TCB). The goal of my project was to encapsulate the ready queue structure within a scheduler subsystem, thus hiding the implementation from the rest of the RTEMS kernel, especially the thread management code. I was mostly successful.

New Modular Scheduler

To provide maximum flexibility but maintain low overhead, the new Scheduler is dynamically configurable yet avoids making runtime checks.

Scheduler Control Structure

The scheduler is implemented as a stand-alone data structure with associated functions, and also as some additional per-thread and system-wide bookkeeping. The main structure for the scheduler is the Scheduler_Control, defined as:

typedef struct Scheduler_Control_struct {
  /** 
   *  Pointer to the data structure used to manage 
   *  the ready set of tasks. Varies based on 
   *  ready queue required by the scheduler.
   */
  union{
    /** 
     * This is the set of lists (array) 
     * for priority scheduling. 
     */
    Chain_Control            *Priority;
  } ready_queues;

  /** Jump table for scheduler-specific functions */
  Scheduler_Operations                  operations;
} Scheduler_Control;

Scheduler_Operations is explained later. Notice that the Scheduler_Control embeds the scheduler's ready queue via a pointer in a union. This is for extensibility, so that the same basic Scheduler_Control structure can be used for another scheduling algorithm that may use a different ready queue structure by simply adding another pointer to the ready_queues union. This allows any number of ready queues to be added to the code base, but at runtime only the ready queue that is actually used by the particular scheduler will be accessible. The pointer is initialized during scheduler initialization.

Ready Queue Encapsulation

Each scheduler implementation encapsulates its ready queue behind an opaque Scheduler Interface, consisting of the following abstract methods:


/**
  * Update the Thread_Heir with the
  * highest priority ready thread
  */
_Scheduler_Schedule( void )
 
/**
  * Currently executing thread
  * voluntarily gives up the CPU.
  * Might cause a thread dispatch.
  */
_Scheduler_Yield( void )
/**
  * tcb is removed from the set of
  * ready threads.
  * Might cause a thread dispatch.
  */
_Scheduler_Block( Thread_Control *tcb )

/**
  * tcb is added to the set of
  * ready threads.
  * Might cause a thread dispatch.
  */

_Scheduler_Unblock( Thread_Control *tcb )

The scheduler does not care why a thread is blocked or unblocked. Thread state management is not the responsibility of the scheduler.

These four functions are an interface between the scheduler and the rest of the kernel for operations that touch the ready queue. I defined this interface by examining how the current ready queues are used and isolated the common patterns. I then hid the ready queue behind this Scheduler interface. In the original RTEMS scheduling mechanism, this was the Chain_Control _Thread_Ready_chain. Other variables that might be subsumed by the scheduler include _Thread_Executing and _Thread_Heir, but these are transitioning to the per-cpu code base, so I left them alone for the time being.

Per-Thread Metadata

A lot of the complexity for the new scheduler is involved in flexible and efficient per-thread management of scheduler-related data. The current solution embeds a pointer in the tcb to a per-thread scheduling structure that is dynamically allocated at task creation time. In actuality, a union of such pointers is placed in the tcb. More on that later.

To support the pre-existing priority-based scheduler, the following per-thread scheduling data is instantiated for each tcb:


/**
 * Per-thread data related to the
 * _Scheduler_PRIORITY scheduling policy.
 */
typedef struct  {
  /** This field points to the Ready FIFO
      for this thread's priority. */
  Chain_Control *ready_chain;

  /** This field contains precalculated
      priority map indices. */
  Priority_bit_map_Information Priority_map;
} Scheduler_priority_Per_thread;

This structure is used mainly to assist in finding the ready queue for the enclosing tcb. Management of the per-thread metadata is done through three more abstract Scheduler methods:

/**
 * This routine allocates @a the_thread->scheduler.
 */
void * _Scheduler_Thread_scheduler_allocate(
    Scheduler_Control   *the_scheduler,
    Thread_Control      *the_thread
);

/**
 * This routine frees @a the_thread->scheduler.
 */
void _Scheduler_Thread_scheduler_free(
    Scheduler_Control   *the_scheduler,
    Thread_Control      *the_thread
);

/**
 * This routine updates @a the_thread->scheduler  
 * based on @a the_scheduler and thread state
 */
void _Scheduler_Thread_scheduler_update(
    Scheduler_Control   *the_scheduler,
    Thread_Control      *the_thread
)

The first two functions are invoked only during task creation and deletion. The last function, _Scheduler_Thread_scheduler_update, is currently invoked when a thread changes its priority, but is intended to be used for any situation that the per-thread scheduler metadata needs to be updated. This may require some additional flags to indicate what event is causing the update, but for now update is only during a priority change event.

Scheduler Operations Table

A function jump table is embedded in the Scheduler_Control structure to provide for flexibility via dynamic binding at runtime for scheduling operations. The Scheduler_Operations structure looks like:

/**
 * function jump table that holds 
 * pointers to the functions that
 * implement specific schedulers.
 */
typedef struct {
  /** Implements the scheduling decision logic. */
  void ( *schedule ) ( Scheduler_Control * );

  /** Voluntarily yields the processor. */
  void ( *yield ) ( Scheduler_Control * );

  /** Removes the given thread from scheduling decisions. */
  void ( *block ) ( Scheduler_Control *, Thread_Control * );

  /** Adds the given thread to scheduling decisions. */
  void ( *unblock ) 
    ( Scheduler_Control *, Thread_Control * );

  /** allocates the scheduler field of the given thread */
  void * ( *scheduler_allocate ) 
    ( Scheduler_Control *, Thread_Control * );

  /** frees the scheduler field of the given thread */
  void ( *scheduler_free ) 
    ( Scheduler_Control *, Thread_Control * );

  /** 
   * updates the scheduler field of the given thread
   * primarily used when changing the thread's priority. 
   */  void ( *scheduler_update ) 
    ( Scheduler_Control *, Thread_Control * );
} Scheduler_Operations;

These function pointers correspond with the functions described above. Indeed, the function pointers are wrapped by inline functions. A scheduler implementation defines functions that match the function pointer signatures, and then installs a Scheduler_Operations table. The real magic of the modular scheduler is in how a scheduler is configured by an application.

Configuring the Scheduler

There are two main ways to provide configuration in RTEMS:

static configuration using configure (autoconf) options
dynamic configuration using confdefs.h options

These two choices provide different plusses and minuses.

configure is most useful for coarse-grained configuration options that can provide conditional compilation. Its main advantage is that it can drastically reduce the size of the compiled binary image by removing large subsystems that a user does not want included, for example support for POSIX. Further, static configuration can reduce runtime overheads because configuration options are not checked during the runtime. However, configure options are less maintainable, lead to pollution of source code with RTEMS-specific #ifdef blocks, and cannot be changed without rebuilding the application to produce a new binary image.

confdefs.h is the main way in which users are able to configure certain aspects of RTEMS at runtime, by way of changing values in configuration tables. The confdefs.h approach has the main advantage of flexibility, because a user can change the configuration tables and reset the application (board) in order to change a system's configuration without having to upload a new binary image. The main disadvantages of confdefs are that it cannot provide conditional compilation and there is additional overhead at runtime to check the configuration options. The added overhead mostly can be relegated to the application's initialization phase.

I provide a confdefs.h based configuration for the new scheduler. The scheduler configuration allows an application to select the scheduling policy to use. The supported configurations currently are:

CONFIGURE_SCHEDULER_USER
CONFIGURE_SCHEDULER_PRIORITY

If no configuration is specified by the application, then CONFIGURE_SCHEDULER_PRIORITY is the default. An application can define its own scheduling policy by defining CONFIGURE_SCHEDULER_USER and CONFIGURE_SCHEDULER_ENTRY_USER to point to an initialization routine. However, the user scheduling has not actually been tested. I also have two additional scheduling algorithms waiting in the wings.

The confdefs.h logic is, I hope, easy to follow. The first thing is to ensure the scheduler structures are loaded, and then check for the USER scheduler:

#include <rtems/score/scheduler.h>
#if defined(CONFIGURE_SCHEDULER_USER) && \
    !defined(CONFIGURE_SCHEDULER_ENTRY_USER)
  #error "CONFIGURE_ERROR: CONFIGURE_SCHEDULER_USER \
          without CONFIGURE_SCHEDULER_ENTRY_USER"
#endif

There is a user configuration option to include all of the schedulers available from the RTEMS sources in the built applications, although this is also an untested feature, like the USER scheduler:

#if defined(CONFIGURE_SCHEDULER_ALL)
  #define CONFIGURE_SCHEDULER_PRIORITY
#endif

Next, the user configuration options are processed. If no scheduler is specified, then the priority-based scheduler is configured:

/* If no scheduler is specified, 
   the priority scheduler is default. */
#if !defined(CONFIGURE_SCHEDULER_USER) && \
    !defined(CONFIGURE_SCHEDULER_PRIORITY)
  #define CONFIGURE_SCHEDULER_PRIORITY
  #define CONFIGURE_SCHEDULER_POLICY _Scheduler_PRIORITY
#endif

As another special check, a USER provided scheduler needs to be checked in order to define the scheduler policy.

/*
 * If a user scheduler is specified and no policy is set, 
 * the user scheduler is the default policy.
 */#if defined(CONFIGURE_SCHEDULER_USER) && \
    !defined(CONFIGURE_SCHEDULER_POLICY)
  #define CONFIGURE_SCHEDULER_POLICY _Scheduler_USER
#endif

Next, process each of the configured schedulers, for now this is just the priority scheduler. Here is where the real magic happens: the priority-based scheduling structures and code are only included if the priority scheduler is configured for the user's application. At the end of this block, the memory requirements for including the priority are computed:

/* 
 * Check for priority scheduler next, 
 * as it is the default policy if there
 * is no CONFIGURE_SCHEDULER_POLICY set 
 * and no USER scheduler provided.
 */
#if defined(CONFIGURE_SCHEDULER_PRIORITY)
  #include <rtems/score/schedulerpriority.h>
  #define CONFIGURE_SCHEDULER_ENTRY_PRIORITY \
     { _Scheduler_priority_Initialize }
  #if !defined(CONFIGURE_SCHEDULER_POLICY)
    #define CONFIGURE_SCHEDULER_POLICY _Scheduler_PRIORITY
  #endif
  
  /**
   * define the memory used by the priority scheduler
   */
  #define CONFIGURE_MEMORY_SCHEDULER_PRIORITY ( \
    _Configure_From_workspace( \
      ((CONFIGURE_MAXIMUM_PRIORITY+1) * \
        sizeof(Chain_Control)) )) \

  #define CONFIGURE_MEMORY_PER_TASK_SCHEDULER_PRIORITY ( \
    _Configure_From_workspace( \
      sizeof(Scheduler_priority_Per_thread)) )
#endif

After adding all of the configured schedulers, the scheduling table is configured. During scheduler initialization, the scheduling table is indexed to select the active scheduler. This mechanism for configuring is based on the mechanism for filesystem configuration used in RTEMS confefs.h:

/* 
 * Set up the scheduler table.  
 * The scheduling code indexes this table to 
 * invoke the correct scheduling implementation. 
 * The scheduler to use is determined by the 
 * Configuration.scheduler_policy field, which is set
 * by CONFIGURE_SCHEDULER_POLICY. If a particular scheduler 
 * is not enabled, an empty entry is included in its 
 * entry in the scheduler table.
 */

  /**
   * An empty scheduler entry
   */  #define CONFIGURE_SCHEDULER_NULL { NULL }

#ifdef CONFIGURE_INIT
  /* the table of available schedulers. */
  const Scheduler_Table_t _Scheduler_Table[] = {
    #if defined(CONFIGURE_SCHEDULER_USER) && \
        defined(CONFIGURE_SCHEDULER_ENTRY_USER)
      CONFIGURE_SCHEDULER_ENTRY_USER,
    #else
      CONFIGURE_SCHEDULER_NULL,
    #endif
    #if defined(CONFIGURE_SCHEDULER_PRIORITY) && \
        defined(CONFIGURE_SCHEDULER_ENTRY_PRIORITY)
      CONFIGURE_SCHEDULER_ENTRY_PRIORITY,
    #else
      CONFIGURE_SCHEDULER_NULL,
    #endif
  };
#endif

Finally, the memory overheads for each of the configured schedulers should be added together to compute the scheduling subsystem's memory requirements. Note that this does not currently support estimating any USER scheduler, only the RTEMS-provided schedulers:

/**
 * Define the memory overhead for the scheduler
 */#define CONFIGURE_MEMORY_FOR_SCHEDULER ( \
    CONFIGURE_MEMORY_SCHEDULER_PRIORITY \
  )

#define CONFIGURE_MEMORY_PER_TASK_FOR_SCHEDULER ( \
    CONFIGURE_MEMORY_PER_TASK_SCHEDULER_PRIORITY \
  )

The value of CONFIGURE_SCHEDULER_POLICY is propagated into the newly added scheduler_policy field of the RTEMS configuration table:


/** This field specifies the scheduling policy to use. */
uint32_t                       scheduler_policy

At runtime, during the scheduler initialization the Scheduling table is indexed by the value of scheduler_policy in order to call the configured scheduler's initialization routine which is responsible for initializing its operations table and ready queue structures:

void _Scheduler_Handler_initialization( )
{
  Scheduler_Control *the_scheduler = &_Scheduler;

  (*(_Scheduler_Table[ \
        Configuration.scheduler_policy \
      ].scheduler_init))(
    the_scheduler
  );
}

Merging the New Scheduler

RTEMS uses a bugzilla to vet code changes. I submitted the scheduler code as PR 1647. Aside from a few stylistic issues, there were three issues that I needed to resolve. These blocker issues dealt with introducing performance latency overheads and that the scheduler's memory overhead was not being calculated. The memory overhead issue was corrected in confdefs.h and is documented above. The other two issues are described in the following.

Short-circuit Evaluation

There were some fairly substantial latency increases when running the RTEMS tmtests with the SPARC sis simulator. Extra overhead came primarily from doing bit-scan operations in cases where a short-circuit is possible by knowing that the heir (RTEMS terminology for highest priority ready thread) either changed explicitly (e.g. by unblocking a higher priority thread than the currently executing) or in cases where there were no other threads ready to take over the CPU from a yielding thread. Such overheads were eliminated by adding some checks for these special cases.

Latency on task_create and task_delete are still increased, because there is some additional allocation/freeing for per-thread scheduling metadata.

PowerPC overheads

Testing the scheduler on the PowerPC architecture (psim simulator) exposed a few more issues. Since these issues were not raised during the SPARC tests, I expected that architectural features were to blame. The PowerPC test exposed the problem with calculating the memory overhead for the scheduler. The PowerPC contains floating point tasks, which changes the memory requirements and puts more pressure on the kernel's Workspace (heap). Other problems with the PowerPC test were related to performance latency increases that were not observed with the SPARC.

Because of the SPARC's register windowing feature, function calls are extremely cheap up to a certain threshold (about 6 function calls deep), after which calls become more expensive due to pressure on the register window. In contrast, the PowerPC has a more traditional "flat" register file, which means function calls may need to save caller- and callee-save registers on any given function call. This means there is a higher function call overhead for smaller call depths (since the SPARC has almost zero function call overhead for small call depths).

Originally, I had designed the modular scheduler with a modular ready queue as well, which potentially allows schedulers to share some ready queue operations and structures. However, the extra flexibility was implemented with another function call table. With the use of function pointers, the ready queue (and scheduler) functions were no longer inlined. This was adding at least two function calls on every scheduler invocation.

After making the ready queue implicitly part of the scheduler, I was able to make all of the ready queue functions inline within the scheduler implementation. Inlining the ready queue saved much of the overhead that remained, although there is still some overhead from adding one level of function calls to scheduling events.

Open Issues

There are still a few lingering items to be done.

Improper Encapsulation

There remain two places in the thread management code that are not properly encapsulated: threadsettransient.c and threadchangepriority.c both make direct modifications of the priority scheduler ready queue. Down the line it may become necessary to extend the scheduler interface to the ready queue for one or both of these purposes.

Interrupt Service Routine (ISR) Latency

Most of the scheduler functions assumes that they are called with protection enabled (i.e. with interrupts disabled). However, they do not take any arguments to help them to disable protection, for example to do an ISR_Flash(level). So although the functionality is identical, the refactoring introduces some change to the interrupt latency of the scheduling code.

For uni-processor scheduling we can simply pass the nested ISR 'level' as an argument to the scheduler; however, for SMP this will not work. Joel said that his "design thought was to have processor interrupts off locally while you owned an "interrupt disable lock". So it may work just fine if the SMP interrupt disable lock is known.

Documentation

This blog post is actually a step in this direction. After I add some more scheduling algorithms, I will need to explain when a user should select specific algorithms. Additionally, other implementers will need to be able to navigate the new scheduling infrastructure, especially when debugging or adding schedulers. I am also thinking about writing up a technical report to cover some of this material, or to demonstrate some useful features of the new scheduler. Unfortunately, this is off the critical path.

Configuration

Joel suggests that I consider a way to "[m]ove scheduler policy out of the Configuration Table and make it so the user is setting a variable that we can directly use in the Score. This ties in to simplifying the initialization of the scheduler below where you set some fields."

There are some other issues with configuration. For one thing, the USER configuration option should be either tested or eliminated. Also, use cases for multiple scheduler algorithms concurrently being available (although NOT concurrently used!) should be identified.

New Technologies

Finally, using the new modular scheduler to explore new ways of scheduling is the motivation for this project. I have already lined up two new scheduling algorithms, the Earliest Deadline First algorithm and a simple First-In First-Out scheduler. Determining how well this new infrastructure will support SMP scheduling remains, in my opinion, the primary direction coming out of this project.

Posted in hacking, RTEMS | No comments

Thursday, 29 July 2010

Retroactive Introduction to my GSOC project

Posted on 14:51 by Unknown

I've been busy this summer, between my GSOC project, my academic progress, and life in general. I thought I should catalog some of the work that I've done for the GSOC 2010. I have previously talked about my work to port RTEMS to SPARC V9. Well, I had a GSOC project accepted for the RTEMS project which I have been working on this summer. The remainder of this blog post is excerpted from my project proposal and describes what I have been doing lately. The full project proposal can be read online at: http://code.google.com/p/rtems-sched/wiki/ModularSuperCoreSchedulerProject.

My project is to refactor the RTEMS task scheduler to make it more modular. The current RTEMS scheduler is tightly coupled with the thread management code. This tight coupling prevents easy implementation and testing of alternate scheduling mechanisms. Refactoring the scheduler into a new SuperCore Scheduler Manager enables RTEMS developers (namely, me) to explore new scheduling mechanisms.

After finishing my project, the RTEMS scheduler internal data structures will be encapsulated in a module that provides a protected interface such that none of the thread management code directly touches scheduling data.

Perhaps the most important outcome of my project will be enabling exploration of scheduling for symmetric multiprocessing (SMP) and multicore platforms. An SMP system is a closely-coupled system with multiple CPU chips interconnected with main memory; a multicore system is even more tightly coupled, with multiple CPUs interconnected on a single chip. Although RTEMS currently supports multiprocessing systems, the support is designed for coarse-grained systems. Each processing element (core) runs a separate RTEMS application and executive, and communication is routed through a portion of shared memory. Scheduling decisions are made locally at each core, and tasks are not able to migrate between cores.

Although my project will not provide SMP support, the goal of refactoring the scheduler is to facilitate implementing an SMP scheduler, so care will be taken to make sure the interface does not enforce any policies that might deter an SMP scheduler. By supporting global scheduling decisions and task migration, RTEMS will be able to balance a processing load across multiple CPUs. Further, unifying the executive across all the cores in the system allows RTEMS to make smarter decisions and to reduce the effort of end users in deploying SMP and multicore configurations.

Posted in RTEMS | No comments

Monday, 26 July 2010

Slighlty OT: Automatic Pet Feeders

Posted on 13:38 by Unknown

I have two cats, and I think that buying them automatic watering and feeding equipment was a really great idea. It is nice not to worry about providing fresh water and checking the food dish every day. But the real advantage is in being able to go away for a weekend without needing to find someone to check in on the cats.

I got the Petmate Le Bistro Portion Control Automatic Pet Feeder and the Petmate Deluxe Fresh Flow watering dish sometime last year, and after well over half a year I can gladly say they were worth it.

The feeder is a little finicky for the portion sizes, I read that it works better for larger pet food, for example for dogs. My cats do ok on portioning themselves, so I mainly use the feeder as a gravity feeder, although I will occasionally use the portion control if I think the cats are looking a bit chubby! The 5 pound capacity of the feeder lasts a good amount of time, and it runs on D batteries which I haven't had to replace yet. Hopefully it will be easy to tell when it is time to replace them.

The watering dish needs to be cleaned regularly, much like any water dish, but the cats love how the water moves and I think that helps them to drink more regularly. I just recently figured out how to fully clean the water pump, which is important to do because it eventually clogs and the water flow is greatly restricted. I haven't bothered to buy more water filter refills, since I clean and replace the water about once a week. I did notice when I had a filter that the water and dish seemed to stay cleaner, but the filter got really scummy quickly.

Posted in | No comments

Friday, 9 July 2010

Understanding Energy and Power

Posted on 10:24 by Unknown

Lately I've been looking at power as an evaluation metric for my research. Power consumption has always been an important design concern in embedded and resource constrained devices. Recently, power has also become important in desktop and server computing environments at the chip-level and at the rack-level respectively.

Energy and power are related, and the two are so often used synonymously that confusing them is quite easy.

Power, typically measured in (kilo)watts, is a rate of production or consumption of energy. The watt unit is an expression of coulombs per second, where a coulomb is a unit of electric charge. So power is the rate of change of electric charge over a period of time. Energy is typically measured in (kilo)watt-hours, which is what shows up on your "power" bill. So you pay for the total amount of energy consumed during the period of your bill, which is actually power integrated over that interval.

A common analogy to help understand the relationship between power and energy is that power is to a flow of water as energy is to a pool of water. The flow can be very slow, dripping even, but can fill a pool given enough time; or the flow can be very fast and fill a pool even quicker. The amount of water in the pool is analogous to energy, and the rate of the flow is analogous to power.

I find it is also helpful to consider the physical equations.

To understand power and energy, think back to middle or high school and recall Ohm's law, I = V / R (equivalently V = I * R), where I is current, V is voltage, and R is resistance. Current is the movement of electric particles, measured in amperes. Voltage is the force required to drive a current between two points, measured in volts. Resistance is opposition to current, measured in ohms. Note that current and voltage are only non-zero if there is a mismatch in the electrical potential between two connected points, and that bad things happen as resistance approaches zero.

Ohm's law is about as far as most people recall (if even that far!), and we haven't yet reached energy or power. We can get some interesting equations by substituting Ohm's law into Joule's first law after some massaging, P = I^2 * R = V * I = V^2 / R, where P is the power dissipated by a resistor. Power dissipation is a more accurate term for the notion of power consumption, although the two can be used interchangeably to mean the conversion of power into some other form of transfer of energy, for example heat or sound. The power dissipation of the resistive elements of a circuit is equivalent to the instantaneous power applied to the circuit in order to generate current through it. By considering a current applied from time 0 to t to a circuit with resistance R, the electric energy created by the current passing through the resistive elements of the circuit during that time interval is E = P * t.

The moral of the story is that power is the rate of transfer of electrical charge, and that energy is an accumulation of electrical charge.

Posted in | No comments

Friday, 18 June 2010

Building and Booting RTEMS sparc64

Posted on 10:36 by Unknown

I've put together a package at http://code.google.com/p/rtemssparc64/downloads/list called buildsparc64.tgz that contains a directory structure with scripts to help build the niagara and usiii BSPs.

There is a README with some simple instructions. After extracting the archive, you'll need to create a link in the build-sparc64 directory to the RTEMS sources on your machine (that contain the sparc64 CPU model and BSPs). Assuming you have the sparc64-rtems4.11 tools installed, you will then be able to build the niagara or usiii BSPs. Then there are scripts in the boot subdirectory to help build a bootable ISO image for either BSP. The scripts are set up to only build and package some of the RTEMS sample applications. Note that only Niagara has been run on open source simulators so far. I have run both BSPs on the Simics simulator, and you can contact me privately for more information.

Once you successfully build the image.iso file, you can follow the instructions at my previous post about the M5 simulator to run the BSP. Simply make a link from the built image.iso file to the /dist/m5/system/disks/image.iso file, and make the changes I mention to the M5 configuration files.

Posted in RTEMS | No comments

Thursday, 17 June 2010

SPARC V9 (sparc64) RTEMS

Posted on 11:33 by Unknown

I'm pleased to announce that the work I've done, together with my colleague Eugen, to get RTEMS working on the sparc64 processor architecture, has been accepted for inclusion to the RTEMS project. We've been working on porting RTEMS to sparc64 for about 9 months, and have had a functional port for about 4 months.

We started with a basic template from the 32-bit SPARC V7 port of RTEMS, and modified or replaced its code with the appropriate support for the 64-bit SPARC V9. There were quite a few changes, since we started with SPARC V7 and skipped ahead to V9. The SPARC V8 truly is somewhere between the two ports. For user-space compatibility, the processor is not substantially different; however, the privileged processor state is very different in SPARC V9, and even more so in the sun4v variant in which we were most interested. I've shared a Updated: Google Doc that contains a lot of our notes from the porting process. We started by focusing on the SPARC Niagara CPU, but have since added support for the UltraSPARC 3 CPU model and should be able to run most UltraSPARC family processors.

A description of the Board Specific Package (BSP) for the Niagara CPU can be found on the RTEMS Wiki at the http://www.rtems.com/wiki/index.php/Niagara page. I'm still working to get instructions on how to run the BSP in the M5 open source simulator, which I hinted at in my previous post about M5.

It was a fair amount of work to clean up the code to submit it to RTEMS, but it will be nice to have the code in the repository. I guess we'll see if anyone else is actually interested in using it! One nice aspect, and I think one of the reasons the RTEMS people accepted the work, is that there are no other 64-bit architectures currently supported by RTEMS. So adding the SPARC V9 processor family provides a good way to test RTEMS' 64-bit capabilities. Eugen and I did not have too much difficulty in this regard, at least from a functionality point-of-view.

Hopefully now that the port is "upstream", I can again focus on my research efforts actually using RTEMS on the sparc64, and not just getting things to work!

Posted in RTEMS | No comments

Monday, 10 May 2010

Rant of the Week (ROTW): Facebook

Posted on 09:57 by Unknown

Facebook has morphed yet again. But this time, instead of just changing the interface, it actually changes your profile. My profile went from being something relatively custom-fit to me, to being a set of links to pages that are tangentially related to me.

Of course this makes sense from a business perspective, it makes it a lot easier to put together sets of interests to crunch numbers for advertising purposes. However, I feel the social value of my profile goes down, because anything that is not expressible as a "page" is no longer allowed outside of the Bio / Quotations field.

My knee-jerk reaction: delete everything that links my profile to other pages. If Facebook wants to make advertising $$ off of me, they'll have to infer things in a less obvious and heavy-handed way. Honestly, I'd rather have a stripped-down profile than one with links to pages that I don't really feel strongly about. It would not be so bad if it were an opt-in process, for example if I could mix linked content with unlinked content. However, being forced to use one mode of expression, i.e. linked pages, is unacceptable to me.

I won't go so far as to quit Facebook, it still has some value. For example, sharing thoughts with friends through the status updates, sharing photos and photo albums, and connecting/staying in touch with old friends. But for these things I do not need to be part of an inclusive web of linked pages in order to share information about myself.

It's bad enough when I don't know what Facebook is doing with my information. (I assume selling as much of it as possible without getting too much bad publicity.) However, dictating what I can and cannot say about myself, especially for mundane non-inflammatory things such as what movies I like, is draconian Stalinism that has no place in what should be an open environment for the dissemination of opinions and ideas.

Posted in life, rant | No comments

Saturday, 8 May 2010

A week in M5

Posted on 19:05 by Unknown

I spent the last week or so playing around with a relatively new simulator called M5. It is a very nice simulator, especially for architectural research. My interest in it is to use it for doing both operating system and architecture research. This type of research is difficult to do with most simulators, because fully functional simulators rarely provide accurate timing to evaluate overheads, while timing simulators rarely provide sufficient functionality to run an operating system. What is needed is a full system simulator that provides fidelity for both functionality and timing.

In the past our research group has used SimpleScalar for architectural research. However, that project is obsolete and the simulator is slowly becoming obsolete with respect to real-world systems. Also, it does not provide full system simulation, although a set of patches by Jack Whitham merges RTEMS and SimpleScalar to provide full system simulation for RTEMS. I tried to use Jack's patches, but I could not quite get everything working. When I sent him an e-mail to ask about it, he suggested that I should look at M5 as a more realistic simulator instead. So I did.

M5 provides two types of simulators: Full System (FS) and Syscall Emulation (SE).

FS simulation is what I need, although it might be possible to layer experiments on an SE simulator using pthreads or something similar. I have had trouble getting the UltraSPARC T1 SE (SPARC_SE) simulator to work with my RTEMS Sparc64 Port, although it might work with a little bit more massaging. However, I'm not sure it is sufficient for the type of research I want to do, particularly with RTEMS. The main problem I see is that interrupts won't be delivered to the target (RTEMS), because the interrupt sources are not actually simulated. This means there will be no timers and no pre-emptive multitasking, i.e. a very limited feature set.

According to the M5 wiki, the FS Simulator currently supports only two ISA's: DEC Alpha with up to 4 cores (64 cores on an alpha-derived CPU that has no real-world analog) and UltraSPARC T1 with 1 core. These are referred to as ALPHA_FS and SPARC_FS respectively.

M5 also provides a variety of CPU models, with varying capabilities.

AtomicSimpleCPU model is basic functionality only.
TimingSimpleCPU provides CPI=1 with stalls on loads.
O3CPU provides a time-accurate out-of-order pipeline model
InOrderCPU provides a time-accurate in-order pipeline model
Checker is used to provide functional correctness.

As far as I can tell, SPARC_FS so far officially only works on AtomicSimpleCPU. It might work with TimingSimpleCPU, but that would be as good as I can do with Simics Niagara and the Ruby module from Wisconsin GEMS. Unfortunately, I haven't actually been able to get the TimingSimpleCPU to work with SPARC_FS. This means SPARC_FS is not very useful for doing architecture research. However, as an open source simulator, it is a compelling project for helping to ensure that the Sparc64 RTEMS port can be tested.

ALPHA_FS appears to be capable of running on any of the CPU models, except for InOrderCPU. According to the mailing list, full system is not implemented for any of the InOrder platforms. However, RTEMS does not support the DEC Alpha ISA at all, and I'm not aware of any interest in it as a port.

Booting OpenSolaris on M5 SPARC_FS
Follow the instructions on the M5 wiki for getting started. The examples given are for the Alpha targets. You will also want to compile and install the m5term application.

To build the sparc full system targets, use:

$ scons build/SPARC_FS/m5.debug 
$ scons build/SPARC_FS/m5.opt
$ scons build/SPARC_FS/m5.prof
$ scons build/SPARC_FS/m5.fast

Download and extract the OpenSparc OpenSparc architecture and performance modeling tools. Copy *.bin and nvram1 from OpenSPARCT1_Arch.1.5/S10image/ to the /dist/m5/system/binaries/ directory. Also copy disk.s10hw2 from the S10image/ directory to the /dist/m5/system/disks/ directory. Rename reset.bin, q.bin, and openboot.bin to reset_new.bin, q_new.bin, and openboot_new.bin, which are the binaries expected by the m5 SPARC_FS scripts.

In a terminal window, start the simulator with:

$ build/SPARC_FS/m5.debug -d /tmp/output configs/example/fs.py

In another terminal, connect to the simulator with:

$ m5term localhost 3457

You should eventually see this in your m5term window:

==== m5 slave terminal: Terminal 0 ====
Sun Fire T2000, No Keyboard
Copyright 2005 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.20.0, 256 MB memory available, Serial #1122867.
[mo23723 obp4.20.0 #0]
Ethernet address 0:80:3:de:ad:3, Host ID: 80112233.

ok

The "ok" prompt is the OpenBoot prompt. Just type boot and press enter, and the OpenSparc Solaris image will start to boot.

Booting RTEMS on M5 SPARC_FS
The sparc64 sun4v BSP will boot on the SPARC_FS full system simulator of M5. The first step is to bundle RTEMS executables and SILO on to a bootable ISO9660 filesystem. We have some scripts to help create the bootable disk, and use the same approach for booting RTEMS on Simics Niagara. Next change the ${m5}/configs/common/FSConfig.py file and replace disk('disk.s10hw2') with disk('image.iso'), where image.iso is the ISO9660 filesystem image built for booting RTEMS. M5 will look in /dist/m5/system/disks for the image.iso file, so link image.iso to the /dist/m5/system/disks/image.iso location.

I have so far verified that Hello World (hello) and the ticker will boot and execute to completion. However, there is no facility built into M5 to set OpenBoot parameters, in particular I cannot set auto-boot = true; I would need to create an nvram1 with this setting. I could probably write a script to drive the interactive portions of the m5term dialog, maybe even with a configuration checkpoint at the SILO boot prompt.

There are a few other tasks left that would be nice to resolve. First, the simulator does not exit when the RTEMS application finishes. This could be resolved by a script to detect the end of execution, which is what is done now with Simics, or by adding some instructions to cause the application to terminate. Second, it might be nice to get RTEMS to run on SPARC_SE, although the functionality that would be provided on this platform is questionable. Third, getting things to work on the InOrderCPU model would be ideal, although that functionality is not currently supported in m5 for SPARC_FS. Fourth, I was not able to hook up the remote debugger (gdb) without getting some strange errors that are probably related to the flavor of gdb and the ABI assumed by M5's UART. This last task was the result of a problem I faced trying to get SPARC_SE to work.

Overall, I found it was easy to compile and build the M5 simulator. I also got a chance to go into the source, but I did not go very far so I will withhold judgement there. I am excited that the basic AtomicSimpleCPU SPARC_FS is able to boot and run RTEMS, which will help get the Sparc64 RTEMS Port accepted, included, and maintained in the RTEMS code base.

Posted in hacking, RTEMS, work | No comments