HowTo

Contents of this Section:

1. Initial design decisions

a. Using a designated base address.

b. Sizing and growing the region

2. How to create and attach to shared regions.

3. Creating or reusing an object in the shared region

4. Using ACE_STL_Allocator

5. How to use virtual methods with shared memory

Initial design decisions:

After working with the ACE libraries Memory Pool concept for some time, I’m convinced that a one-size fits all approach is a bad idea. The different approaches involve tradeoffs between performance, compatibility with the standard STL implementations, and may impose unnecessary limits.

There are a series of specific strategies which can be employed to help make STL containers in shared memory work with virtually any STL implementation. The design choices for your implementation include the following options:

1. If the OS allows it, or if your processes are all forked from a common parent, then you may be able to designate the address that the shared region is mapped at, and optimize performance and ease the design impact on classes that will reside in shared memory. If you can’t ensure that region will be mapped at the same address on all processes, then you have to ensure that you use relative pointers in the shared memory region. This means ensuring that your STL container honors the allocator::pointer typedef and the rebind interface and will use the typedef member of the allocator passed to it. This is not true of many STL implementations.

2. If the OS supports the SEGV handling strategy defined below, and also allows a degree of position mapping (picking the virtual address that the region is mapped at) to be done, then region growth can be performed safely, and the growth of a region can be detected via the standard SEGV triggers of the CPU. This avoids constantly bounds checking every pointer traversal and maintains performance, while still allowing the region to grow. It also works regardless of whether normal C++ pointers or the special relative pointer are used in the region.

3. If the OS doesn’t support the SEGV handling strategy, and/or doesn’t support position mapping, then your options are further reduced, and you have to use a segment-based pointer object in shared memory, and perform region growth by adding and mapping additional individual segments as needed. This approach is the most versatile, but coms with the cost of a slightly more involved pointer translation (segment base addr lookup + offset), and requires constantly checking to see if the segment_id corresponds to an as-of-yet unmapped segment. (i.e. detecting growth.) This approach does not, however, rely on positional mapping of regions, nor on a working SEGV handler. It is also likely to be the least performant.

Using a designated base address:

The biggest design point is whether or not you can use a strategy to ensure that all processes using a region map it at a common virtual address.

This isn’t as hard as it might sound. Here are some techniques that I’ve played with in the past:

1. If you can arrange to have all the processes which will use the shared region get forked from a common parent which does the initial mapping, then they will all be using the region from a common address.

2. The ACE shared region classes support the option to designate the desired address that the region should be mapped in at. Whenever you look at the man pages for mmap(), you’ll note that the API allows you to designate the address of the map. But all man pages include the same warning about that feature being non-standard. It actually seems to be available on a surprising number of platforms. What does vary is what a particular platform might consider a valid address.

If you can adopt an approach where all processes using the region can map it at the same virtual address, then it eliminates the need for you to use a specialized pointer type in the STL containers, which in turn makes it easier to use STL implementations which ‘assume’ that pointers in the container implementation are OK. It also generally improves the performance of the shared memory STL container, making it equivalent to the normal heap-based variety.

Sizing and growing the region:

The memory pool classes allow you to specify an initial size. The memory acquired in this way is turned over to a malloc-like algorithm that manages the free space within the region. When this malloc-like algorithm runs out of memory, it will ask the shared region to increase its size. This approach is similar to how malloc() works with the OS to increase the amount of heap memory available to a process in a standard process.

The ACE memory pool implementations use a technique to permit the growth of regions and allow individual processes to detect when a region has increased size. It is intended to work even when the shared region holds standard C++ pointers which might point beyond the portion of a region that the process currently has mapped.

It employs a SEGV handler to catch SEGVs that occur when an address is dereferenced which lies beyond the bounds of the shared region. This occurs when the region’s size has been increased by another process, but the current process hasn’t noticed yet. The SEGV handler checks to see if there is a region, which has actually grown in size, and which, if completely mapped in, would have included the address that faulted. if so, it remaps that region into memory so that the fault address becomes valid. The SEGV handler then returns and permits the original pointer dereference to occur successfully. This approach has the advantage of letting the CPU perform all bounds checking, and keeping the relative pointer implementation very fast.

It also has a cost: whenever a region grows and a new portion added, the new portion must be appended to the existing virtual address space of the region, so that the base address of the whole region does not change. This is because the address being accessed when the SEGV occurs can’t be effectively changed at that point.

This characteristic in turn assumes that the positional mapping capability works. Or more precisely, that the second and subsequent mapping can be performed positionally using the address that the OS picked the first time around. So the whole solution depends on this ability to some degree.

How to create and attach to shared regions

Creating or reusing an object in the shared region

Using ACE_STL_Allocator

How-To: Virtual Methods on Shared Objects

A problem that will come up, inevitably, is wanting to store an object in shared memory which has virtual methods. There is a problem. The object inevitably has a pointer back to a vtable which is expected to be in the address space of the running process. If different processes don't have the same vtable address for the class in question, they end up considering each other’s created objects to be invalid, usually to the tone of a SEGV when a call is made to a virtual function.

Aside from proposing changes to the way in which compilers implement the virtual function table, I know of a few ways to overcome this issue:

1. Use an alternative implementation of a ‘virtual function’ which works properly by finding the address of the function in a way that works for multiple processes. One problem with this approach is that it is intrusive to the design of the class in question, and also that in removing ‘real’ virtual functions from the class, it has other undesirable effects like disabling dynamic_cast<> for that type hierarchy.

2. Use serialization. i.e. Don’t store the object in the shared region, rather store a serialized form of it. When fetching the object out of the shared region, use a factory pattern (i.e. virtual constructor) to reinstantiate the correct object type from its serialized form. You can call virtual functions on the object. This approach has one advantage: it can be implemented in a non-intrusive way for existing classes. However, it can be slow compared to the other techniques, and assumes that the STL containers in shared memory are being used in a ‘database-like’ manner.

3. Use the handle/body approach as an alternative to serialization. The handle has all the virtual methods on it, but it holds no state, only a pointer to the body. The body has a type indicator, indicating the type of its handle, but no virtual methods, and is essentially just data. The body is stored in shared memory, so virtual methods never exist in shared memory. A factory can still be used to reconstitute objects out of shared memory, but without the overhead of deserialization.

The best solution (as always) may depend somewhat on your circumstances. I describe #3 below, and provide an example of it. The technique can be changed to employ Boost serialization instead of a handle/body split. In either of those solutions, the use of a Factory to allow containers which are heterogenous ( a common case when you are concerned about using virtual methods on objects in shared memory. )

The handle/body approach is not as elegant as I would like, but it is effective. A standard class is split into a handle and a body.

1. The handle is only allowed to have one data member – a pointer to the body. It may have any number of virtual methods, since it is never stored in shared memory.

2. The body class has all the data members, and typically no methods. i.e. All methods are encapsulated in the handle class, which is the public interface.

The trick is to then keep the object that is in the STL container (in shared memory) to just be the 'body' object, a class object that has no virtual methods. The handle class never exists in shared memory, only the local heap memory of a process, and thus it can make full use of virtual methods. The handle class can have a constructor which is passed the pointer to the body. Thus when a body object is fetched from shared memory you can quickly construct a handle from that body and proceed to use the class.

i.e.

MyMapType::iterator it = map->find( key );

If ( it != map->end() ) {

Handle h( it->second );

h.doSomething();

}

STL Containers of Heterogeneous types:

A powerful programming pattern is to use containers which point to objects in a class hierarchy. Algorithms can be written which work with the data in the container using run-time polymorphism; i.e. by working with all objects per the base class and calling virtual methods.

Consider the following heap-based convention:

class Customer { ... }

class ResidentialCustomer : public Customer { ... }

class BusinessCustomer : public Customer { ... }

class WholesaleBusinessCustomer : public Customer { ... }

map<id_type, Customer*, ACE_STL_Allocator> myMap;

myMap can hold pointers to any Customer type, such that when you query an item in the map, you hold only a Customer* and can then make virtual method calls on the customer, fully exploiting runtime polymorphism. i.e. The algorithm using the map would not have to know the specific subclass that had just been found in the map for a given key (id_type). The algorithm could determine the precise subclass at run-time using dynamic_cast<> operations if it really wanted to know what it had gotten.

To do the equivalent with shared memory, you can continue with the handle/body method forming a pair of hierarchies. One that is all state, and one that is all virtual methods:

// Each of these has one data member: imp, a pointer to its corresponding

// impl_ method.

class Customer { ... }

class ResidentialCustomer : public Customer { ... }

class BusinessCustomer : public Customer { ... }

class WholesaleBusinessCustomer : public Customer { ... }

// Each of these has just data members, but no methods. NOTHING virtual.

// Not even the destructor.

struct Customer_impl { … data … }

struct ResidentialCustomer_impl : public Customer_impl { ... }

struct BusinessCustomer_impl : public Customer_impl { ... }

struct WholesaleBusinessCustomer_impl : public Customer_impl { ... }

map<id_type, Based_Ptr<Customer_impl>, ACE_STL_Allocator> myMap

Now I have a map in shared memory which does not hold any objects which contain virtual methods. I can insert any type into the map by passing it the pointer to the Customer_impl subtype that has been allocated in shared memory. (Note: I can’t, unfortunately, make the map a map of <id_type, Customer_impl, ACE_STL_Allocator>, I have to make it a map of pointers and do the creation of the appropriate Customer_impl subtype on my own. This is because the collection of objects is heterogenous.

The next problem is to figure out how, when I fetch an object from the map, I can identify its type. I can’t use dynamic_cast<> to determine the type of the object found from the map. That method relies on the vtable, and isn’t safe for objects in shared memory. If I add a member variable ‘type_id’ to the Customer_impl base implementation class, I can have every handle class set it appropriately. Now whenever I find() a Customer_impl from the map I can tell what its actual type was at the time of insertion. This allows me to downcast to the appropriate impl subtype (using reinterpret_cast<>), and from that, I can finally use a factory method to help construct the appropriate dataType.

Note: My use of reinterpret_cast<> here implies the assumption that for a “class A : public B”, casting the address of A* to B* does not change the physical address. In cases where multiple inheritance is used in the class hierarchy, this may not be true.

The methods corresponding to insert() and fetch() would then be as follows:

Note: This code may not compile, it was written in this doc to be illustrative, I haven’t tried compiling it.

template <class MapT, class V>

myInsert( MapT &myMap, MapT::key_type id, V& customer ) {

// V::impl_type is a subclass of MapT::mapped_type

STL_ACE_Allocator<(V::impl_type> alloc( myMap.allocator() );

T::impl_type *inShm = alloc->allocate( 1 );

*inShm = *(customer->_impl); // assignment via copy constructor

return myMap.insert( id, inShm );

}

Customer* myFetch( mapT &myMap, id_type key ) {

mapT::iterator I = myMap.find( key );

if ( I == myMap.end() ) return NULL;

switch( i->second->type_id )

{

case Customer::TypeId:

return new Customer( i->second );

case ResidentialCustomer::TypeId:

return new ResidentialCustomer(

reinterpet_cast<ResidentialCustomer_impl>( i->second ) );

case BusinessCustomer::TypeId:

return new BusinessCustomer(

reinterpet_cast<BusinessCustomer_impl>( i->second ) );

default:

cerr << “bug in this code:” __ << endl;

abort();

}

Note that the code above makes the following assumptions:

1. For the handle class B which is a subclass of A, the associated body classes likewise are part of a hierarchy. B_imp is a subclass of A_imp.
2. The base implementation class (A_imp) has a type member ‘type_id’. The other body classes override the value of this during their construction.

Notice that the myInsert() method has the qualities of a generic algorithm. i.e. That should work for any class hierarchy meeting the above conventions.

The myFetch() method, on the other hand, does not. It reflects our class hierarchy. It works, but what if I don’t want to have a chunk of code that is aware of all subtypes of Customer. What if I would like the map, and the above algorithm to work for any subclass of Customer that happens to get added to the application, just as a non-shared memory version would work?

Enter the Factory pattern…

To allow the creation of a segment of code which fetches unknown types from the shared map, and works with those objects polymorphically, we make use of a Factory (aka virtual constructor) to help construct the appropriate handle class from the body pointer fetched from the map. The Factory does the job of constructing the right type based on the implementation fetched.

The loki library has a good Generic Factory implementation in it, which we can use for this. Basically, the various subtypes register themselves with the factory, giving the factory the ability to create instances of their type from a body pointer found in the map. To help with this, we give our Handle classed a constructor which takes one argument - a pointer to a body. It will then verify that the body is of its type, and adopt it as its own. Thereafter, the code can just make use of the handle

The resulting code for the fetch then looks as follows:

template <class MapT, class BaseType>

Customer* myFetch( mapT &myMap, MapT::id_type key ) {

å MapT::iterator iter = map->find(i);

if ( iter != map->end() ) {

MapT::mapped_type *found = iter->second;

BaseType<ACE_STL_Allocator> *obj = SingletonHolder<FactoryType>::Instance().

CreateObject(found->type_id, found);

}

The CreateObject() call transforms the pointer to the body back into an appropriate Handle type, pointing to that

fetched body.

This gives us what we needed - the means to write code that works with anything in the class hierarchy, as known to the factory.

The above example doesn’t show how the factory is informed of the various subclasses in the hierarchy. That is not too hard to do. A convention I’ve used is to have each class contain a static member which informs the Factory singleton about that class and its particular creation function. This means that when the class hierarchy is changed, and new classes are added, no existing code which is using run-time polymorphism needs to change.

An example of using this approach is given in STLshmTest3.cc