CodeGuru : Thinking in C++

Upcasting

When you embed subobjects of a class inside a new class, whether you do it by creating member objects or through inheritance, each subobject is placed within the new object by the compiler. Of course, each subobject has its own this pointer, and as long as you’re dealing with member objects, everything is quite straightforward. But as soon as multiple inheritance is introduced, a funny thing occurs: An object can have more than one this pointer because the object represents more than one type during upcasting. The following example demonstrates this point:

//: C22:Mithis.cpp
// MI and the "this" pointer
#include <fstream>
using namespace std;
ofstream out("mithis.out");

class Base1 {
  char c[0x10];
public:
  void printthis1() {
    out << "Base1 this = " << this << endl;
  }
};

class Base2 {
  char c[0x10];
public:
  void printthis2() {
    out << "Base2 this = " << this << endl;
  }
};

class Member1 {
  char c[0x10];
public:
  void printthism1() {
    out << "Member1 this = " << this << endl;
  }
};

class Member2 {
  char c[0x10];
public:
  void printthism2() {
    out << "Member2 this = " << this << endl;
  }
};

class MI : public Base1, public Base2 {
  Member1 m1;
  Member2 m2;
public:
  void printthis() {
    out << "MI this = " << this << endl;
    printthis1();
    printthis2();
    m1.printthism1();
    m2.printthism2();
  }
};

int main() {
  MI mi;
  out << "sizeof(mi) = "
    << hex << sizeof(mi) << " hex" << endl;
  mi.printthis();
  // A second demonstration:
  Base1* b1 = &mi; // Upcast
  Base2* b2 = &mi; // Upcast
  out << "Base 1 pointer = " << b1 << endl;
  out << "Base 2 pointer = " << b2 << endl;

} ///:~

The arrays of bytes inside each class are created with hexadecimal sizes, so the output addresses (which are printed in hex) are easy to read. Each class has a function that prints its this pointer, and these classes are assembled with both multiple inheritance and composition into the class MI, which prints its own address and the addresses of all the other subobjects. This function is called in main( ). You can clearly see that you get two different this pointers for the same object. The address of the MI object is taken and upcast to the two different types. Here’s the output: [65]

sizeof(mi) = 40 hex
mi this = 0x223e
Base1 this = 0x223e
Base2 this = 0x224e
Member1 this = 0x225e
Member2 this = 0x226e
Base 1 pointer = 0x223e

Base 2 pointer = 0x224e

Although object layouts vary from compiler to compiler and are not specified in Standard C++, this one is fairly typical. The starting address of the object corresponds to the address of the first class in the base-class list. Then the second inherited class is placed, followed by the member objects in order of declaration.

When the upcast to the Base1 and Base2 pointers occur, you can see that, even though they’re ostensibly pointing to the same object, they must actually have different this pointers, so the proper starting address can be passed to the member functions of each subobject. The only way things can work correctly is if this implicit upcasting takes place when you call a member function for a multiply inherited subobject.

Persistence

Normally this isn’t a problem, because you want to call member functions that are concerned with that subobject of the multiply inherited object. However, if your member function needs to know the true starting address of the object, multiple inheritance causes problems. Ironically, this happens in one of the situations where multiple inheritance seems to be useful: persistence.

The lifetime of a local object is the scope in which it is defined. The lifetime of a global object is the lifetime of the program. A persistent object lives between invocations of a program: You can normally think of it as existing on disk instead of in memory. One definition of an object-oriented database is “a collection of persistent objects.”

To implement persistence, you must move a persistent object from disk into memory in order to call functions for it, and later store it to disk before the program expires. Four issues arise when storing an object on disk:

The object must be converted from its representation in memory to a series of bytes on disk.
Because the values of any pointers in memory won’t have meaning the next time the program is invoked, these pointers must be converted to something meaningful.
What the pointers point to must also be stored and retrieved.
When restoring an object from disk, the virtual pointers in the object must be respected.

Because the object must be converted back and forth between a layout in memory and a serial representation on disk, the process is called serialization (to write an object to disk) and deserialization (to restore an object from disk). Although it would be very convenient, these processes require too much overhead to support directly in the language. Class libraries will often build in support for serialization and deserialization by adding special member functions and placing requirements on new classes. (Usually some sort of serialize( ) function must be written for each new class.) Also, persistence is generally not automatic; you must usually explicitly write and read the objects.

MI-based persistence

Consider sidestepping the pointer issues for now and creating a class that installs persistence into simple objects using multiple inheritance. By inheriting the persistence class along with your new class, you automatically create classes that can be read from and written to disk. Although this sounds great, the use of multiple inheritance introduces a pitfall, as seen in the following example.

//: C22:Persist1.cpp
// Simple persistence with MI
#include "../require.h"
#include <iostream>
#include <fstream>
using namespace std;

class Persistent {
  int objSize; // Size of stored object
public:
  Persistent(int sz) : objSize(sz) {}
  void write(ostream& out) const {
    out.write((char*)this, objSize);
  }
  void read(istream& in) {
    in.read((char*)this, objSize);
  }
};

class Data {
  float f[3];
public:
  Data(float f0 = 0.0, float f1 = 0.0,
    float f2 = 0.0) {
    f[0] = f0;
    f[1] = f1;
    f[2] = f2;
  }
  void print(const char* msg = "") const {
    if(*msg) cout << msg << "   ";
    for(int i = 0; i < 3; i++)
      cout << "f[" << i << "] = "
           << f[i] << endl;
  }
};

class WData1 : public Persistent, public Data {
public:
  WData1(float f0 = 0.0, float f1 = 0.0,
    float f2 = 0.0) : Data(f0, f1, f2),
    Persistent(sizeof(WData1)) {}
};

class WData2 : public Data, public Persistent {
public:
  WData2(float f0 = 0.0, float f1 = 0.0,
    float f2 = 0.0) : Data(f0, f1, f2),
    Persistent(sizeof(WData2)) {}
};

int main() {
  {
    ofstream f1("f1.dat"), f2("f2.dat");
    assure(f1, "f1.dat"); assure(f2, "f2.dat");
    WData1 d1(1.1, 2.2, 3.3);
    WData2 d2(4.4, 5.5, 6.6);
    d1.print("d1 before storage");
    d2.print("d2 before storage");
    d1.write(f1);
    d2.write(f2);
  } // Closes files
  ifstream f1("f1.dat"), f2("f2.dat");
  assure(f1, "f1.dat"); assure(f2, "f2.dat");
  WData1 d1;
  WData2 d2;
  d1.read(f1);
  d2.read(f2);
  d1.print("d1 after storage");
  d2.print("d2 after storage");

} ///:~

In this very simple version, the Persistent::read( ) and Persistent::write( ) functions take the this pointer and call iostream read( ) and write( ) functions. (Note that any type of iostream can be used). A more sophisticated Persistent class would call a virtual write( ) function for each subobject.

With the language features covered so far in the book, the number of bytes in the object cannot be known by the Persistent class so it is inserted as a constructor argument. (In Chapter XX, run-time type identification shows how you can find the exact type of an object given only a base pointer; once you have the exact type you can find out the correct size with the sizeof operator.)

The Data class contains no pointers or VPTR, so there is no danger in simply writing it to disk and reading it back again. And it works fine in class WData1 when, in main( ), it’s written to file F1.DAT and later read back again. However, when Persistent is second in the inheritance list of WData2, the this pointer for Persistent is offset to the end of the object, so it reads and writes past the end of the object. This not only produces garbage when reading the object from the file, it’s dangerous because it walks over any storage that occurs after the object.

This problem occurs in multiple inheritance any time a class must produce the this pointer for the actual object from a subobject’s this pointer. Of course, if you know your compiler always lays out objects in order of declaration in the inheritance list, you can ensure that you always put the critical class at the beginning of the list (assuming there’s only one critical class). However, such a class may exist in the inheritance hierarchy of another class and you may unwittingly put it in the wrong place during multiple inheritance. Fortunately, using run-time type identification (the subject of Chapter XX) will produce the proper pointer to the actual object, even if multiple inheritance is used.

Improved persistence

A more practical approach to persistence, and one you will see employed more often, is to create virtual functions in the base class for reading and writing and then require the creator of any new class that must be streamed to redefine these functions. The argument to the function is the stream object to write to or read from. [66] Then the creator of the class, who knows best how the new parts should be read or written, is responsible for making the correct function calls. This doesn’t have the “magical” quality of the previous example, and it requires more coding and knowledge on the part of the user, but it works and doesn’t break when pointers are present:

//: C22:Persist2.cpp
// Improved MI persistence
#include "../require.h"
#include <iostream>
#include <fstream>
#include <cstring>
using namespace std;

class Persistent {
public:
  virtual void write(ostream& out) const = 0;
  virtual void read(istream& in) = 0;
  virtual ~Persistent() {}
};

class Data {
protected:
  float f[3];
public:
  Data(float f0 = 0.0, float f1 = 0.0,
    float f2 = 0.0) {
    f[0] = f0;
    f[1] = f1;
    f[2] = f2;
  }
  void print(const char* msg = "") const {
    if(*msg) cout << msg << endl;
    for(int i = 0; i < 3; i++)
      cout << "f[" << i << "] = "
           << f[i] << endl;
  }
};

class WData1 : public Persistent, public Data {
public:
  WData1(float f0 = 0.0, float f1 = 0.0,
    float f2 = 0.0) : Data(f0, f1, f2) {}
  void write(ostream& out) const {
    out << f[0] << " " 
      << f[1] << " " << f[2] << " ";
  }
  void read(istream& in) {
    in >> f[0] >> f[1] >> f[2];
  }
};

class WData2 : public Data, public Persistent {
public:
  WData2(float f0 = 0.0, float f1 = 0.0,
    float f2 = 0.0) : Data(f0, f1, f2) {}
  void write(ostream& out) const {
    out << f[0] << " " 
      << f[1] << " " << f[2] << " ";
  }
  void read(istream& in) {
    in >> f[0] >> f[1] >> f[2];
  }
};

class Conglomerate : public Data,
public Persistent {
  char* name; // Contains a pointer
  WData1 d1;
  WData2 d2;
public:
  Conglomerate(const char* nm = "",
    float f0 = 0.0, float f1 = 0.0,
    float f2 = 0.0, float f3 = 0.0,
    float f4 = 0.0, float f5 = 0.0,
    float f6 = 0.0, float f7 = 0.0,
    float f8= 0.0) : Data(f0, f1, f2),
    d1(f3, f4, f5), d2(f6, f7, f8) {
    name = new char[strlen(nm) + 1];
    strcpy(name, nm);
  }
  void write(ostream& out) const {
    int i = strlen(name) + 1;
    out << i << " "; // Store size of string
    out << name << endl;
    d1.write(out);
    d2.write(out);
    out << f[0] << " " << f[1] << " " << f[2];
  }
  // Must read in same order as write:
  void read(istream& in) {
    delete []name; // Remove old storage
    int i;
    in >> i >> ws; // Get int, strip whitespace
    name = new char[i];
    in.getline(name, i);
    d1.read(in);
    d2.read(in);
    in >> f[0] >> f[1] >> f[2];
  }
  void print() const {
    Data::print(name);
    d1.print();
    d2.print();
  }
};

int main() {
  {
    ofstream data("data.dat");
    assure(data, "data.dat");
    Conglomerate C("This is Conglomerate C",
      1.1, 2.2, 3.3, 4.4, 5.5,
      6.6, 7.7, 8.8, 9.9);
    cout << "C before storage" << endl;
    C.print();
    C.write(data);
  } // Closes file
  ifstream data("data.dat");
  assure(data, "data.dat");
  Conglomerate C;
  C.read(data);
  cout << "after storage: " << endl;
  C.print();

} ///:~

The pure virtual functions in Persistent must be redefined in the derived classes to perform the proper reading and writing. If you already knew that Data would be persistent, you could inherit directly from Persistent and redefine the functions there, thus eliminating the need for multiple inheritance. This example is based on the idea that you don’t own the code for Data, that it was created elsewhere and may be part of another class hierarchy so you don’t have control over its inheritance. However, for this scheme to work correctly you must have access to the underlying implementation so it can be stored; thus the use of protected.

The classes WData1 and WData2 use familiar iostream inserters and extractors to store and retrieve the protected data in Data to and from the iostream object. In write( ), you can see that spaces are added after each floating point number is written; these are necessary to allow parsing of the data on input.

The class Conglomerate not only inherits from Data, it also has member objects of type WData1 and WData2, as well as a pointer to a character string. In addition, all the classes that inherit from Persistent also contain a VPTR, so this example shows the kind of problem you’ll actually encounter when using persistence.

When you create write( ) and read( ) function pairs, the read( ) must exactly mirror what happens during the write( ), so read( ) pulls the bits off the disk the same way they were placed there by write( ). Here, the first problem that’s tackled is the char*, which points to a string of any length. The size of the string is calculated and stored on disk as an int (followed by a space to enable parsing) to allow the read( ) function to allocate the correct amount of storage.

When you have subobjects that have read( ) and write( ) member functions, all you need to do is call those functions in the new read( ) and write( ) functions. This is followed by direct storage of the members in the base class.

People have gone to great lengths to automate persistence, for example, by creating modified preprocessors to support a “persistent” keyword to be applied when defining a class. One can imagine a more elegant approach than the one shown here for implementing persistence, but it has the advantage that it works under all implementations of C++, doesn’t require special language extensions, and is relatively bulletproof.

[65] For easy readability the code was generated for a small-model Intel processor.

[66] Sometimes there’s only a single function for streaming, and the argument contains information about whether you’re reading or writing.

Contents | Prev | Next

Contact: webmaster@codeguru.com
CodeGuru - the website for developers. [an error occurred while processing this directive]