CodeGuru : Thinking in C++

Associative containers

The set, map, multiset and multimap are called associative containers because they associate keys with values. Well, at least maps and multimaps associate keys to values, but you can look at a set as a map that has no values, only keys (and they can in fact be implemented this way), and the same for the relationship between multiset and multimap. So, because of the structural similarity sets and multisets are lumped in with associative containers.

The most important basic operations with associative containers are putting things in, and in the case of a set, seeing if something is in the set. In the case of a map, you want to first see if a key is in the map, and if it exists you want the associated value for that key to be returned. Of course, there are many variations on this theme but that’s the fundamental concept. The following example shows these basics:

//: C20:AssociativeBasics.cpp
// Basic operations with sets and maps
#include "Noisy.h"
#include <iostream>
#include <set>
#include <map>
using namespace std;

int main() {
  Noisy na[] = { Noisy(), Noisy(), Noisy(), 
    Noisy(), Noisy(), Noisy(), Noisy() };
  // Add elements via constructor:
  set<Noisy> ns(na, na+ sizeof na/sizeof(Noisy));
  // Ordinary insertion:
  Noisy n;
  ns.insert(n);
  cout << endl;
  // Check for set membership:
  cout << "ns.count(n)= " << ns.count(n) << endl;
  if(ns.find(n) != ns.end())
    cout << "n(" << n << ") found in ns" << endl;
  // Print elements:
  copy(ns.begin(), ns.end(), 
    ostream_iterator<Noisy>(cout, " "));
  cout << endl;
  cout << "\n-----------\n";
  map<int, Noisy> nm;
  for(int i = 0; i < 10; i++)
    nm[i]; // Automatically makes pairs
  cout << "\n-----------\n";
  for(int j = 0; j < nm.size(); j++)
    cout << "nm[" << j <<"] = " << nm[j] << endl;
  cout << "\n-----------\n";
  nm[10] = n;
  cout << "\n-----------\n";
  nm.insert(make_pair(47, n));
  cout << "\n-----------\n";
  cout << "\n nm.count(10)= " 
    << nm.count(10) << endl;
  cout << "nm.count(11)= " 
    << nm.count(11) << endl;
  map<int, Noisy>::iterator it = nm.find(6);
  if(it != nm.end())
    cout << "value:" << (*it).second
      << " found in nm at location 6" << endl;
  for(it = nm.begin(); it != nm.end(); it++)
    cout << (*it).first << ":" 
      << (*it).second << ", ";
  cout << "\n-----------\n";

} ///:~

The set<Noisy> object ns is created using two iterators into an array of Noisy objects, but there is also a default constructor and a copy-constructor, and you can pass in an object that provides an alternate scheme for doing comparisons. Both sets and maps have an insert( ) member function to put things in, and there are a couple of different ways to check to see if an object is already in an associative container: count( ), when given a key, will tell you how many times that key occurs (this can only be zero or one in a set or map, but it can be more than one with a multiset or multimap). The find( ) member function will produce an iterator indicating the first occurrence (with set and map, the only occurrence) of the key that you give it, or the past-the-end iterator if it can’t find the key. The count( ) and find( ) member functions exist for all the associative containers, which makes sense. The associative containers also have member functions lower_bound( ), upper_bound( ) and equal_range( ), which actually only make sense for multiset and multimap, as you shall see (but don’t try to figure out how they would be useful for set and map, since they are designed for dealing with a range of duplicate keys, which those containers don’t allow).

Designing an operator[ ] always produces a little bit of a dilemma because it’s intended to be treated as an array-indexing operation, so people don’t tend to think about performing a test before they use it. But what happens if you decide to index out of the bounds of the array? One option, of course, is to throw an exception, but with a map “indexing out of the array” could mean that you want an entry there, and that’s the way the STL map treats it. The first for loop after the creation of the map<int, Noisy> nm just “looks up” objects using the operator[ ] , but this is actually creating new Noisy objects! The map creates a new key-value pair (using the default constructor for the value) if you look up a value with operator[ ] and it isn’t there. This means that if you really just want to look something up and not create a new entry, you must use count( ) (to see if it’s there) or find( ) (to get an iterator to it).

The for loop that prints out the values of the container using operator[ ] has a number of problems. First, it requires integral keys (which we happen to have in this case). Next and worse, if all the keys are not sequential, you’ll end up counting from 0 to the size of the container, and if there are some spots which don’t have key-value pairs you’ll automatically create them, and miss some of the higher values of the keys. Finally, if you look at the output from the for loop you’ll see that things are very busy, and it’s quite puzzling at first why there are so many constructions and destructions for what appears to be a simple lookup. The answer only becomes clear when you look at the code in the map template for operator[ ] , which will be something like this:

mapped_type& operator[] (const key_type& k) {
  value_type tmp(k,T()); 
  return (*((insert(tmp)).first)).second;

}

Following the trail, you’ll find that map::value_type is:

typedef pair<const Key, T> value_type;

Now you need to know what a pair is, which can be found in <utility>:

template <class T1, class T2>
struct pair { 
  typedef T1 first_type; 
  typedef T2 second_type; 
  T1 first; 
  T2 second; 
  pair(); 
  pair(const T1& x, const T2& y) 
    : first(x), second(y) {}
  // Templatized copy-constructor:
  template<class U, class V> 
   pair(const pair<U, V> &p);

};

It turns out this is a very important (albeit simple) struct which is used quite a bit in the STL. All it really does it package together two objects, but it’s very useful, especially when you want to return two objects from a function (since a return statement only takes one object). There’s even a shorthand for creating a pair called make_pair( ), which is used in AssociativeBasics.cpp.

So to retrace the steps, map::value_type is a pair of the key and the value of the map – actually, it’s a single entry for the map. But notice that pair packages its objects by value, which means that copy-constructions are necessary to get the objects into the pair. Thus, the creation of tmp in map::operator[ ] will involve at least a copy-constructor call and destructor call for each object in the pair. Here, we’re getting off easy because the key is an int. But if you want to really see what kind of activity can result from map::operator[ ] , try running this:

//: C20:NoisyMap.cpp
// Mapping Noisy to Noisy
#include "Noisy.h"
#include <map>
using namespace std;

int main() {
  map<Noisy, Noisy> mnn;
  Noisy n1, n2;
  cout << "\n--------\n";
  mnn[n1] = n2;
  cout << "\n--------\n";
  cout << mnn[n1] << endl;
  cout << "\n--------\n";

} ///:~

You’ll see that both the insertion and lookup generate a lot of extra objects, and that’s because of the creation of the tmp object. If you look back up at map::operator[ ] you’ll see that the second line calls insert( ) passing it tmp – that is, operator[ ] does an insertion every time. The return value of insert( ) is a different kind of pair, where first is an iterator pointing to the key-value pair that was just inserted, and second is a bool indicating whether the insertion took place. You can see that operator[ ] grabs first (the iterator), dereferences it to produce the pair, and then returns the second which is the value at that location.

So on the upside, map has this fancy “make a new entry if one isn’t there” behavior, but the downside is that you always get a lot of extra object creations and destructions when you use map::operator[ ] . Fortunately, AssociativeBasics.cpp also demonstrates how to reduce the overhead of insertions and deletions, by not using operator[ ] if you don’t have to. The insert( ) member function is slightly more efficient than operator[ ] . With a set you only hold one object, but with a map you hold key-value pairs, so insert( ) requires a pair as its argument. Here’s where make_pair( ) comes in handy, as you can see.

For looking objects up in a map, you can use count( ) to see whether a key is in the map, or you can use find( ) to produce an iterator pointing directly at the key-value pair. Again, since the map contains pairs that’s what the iterator produces when you dereference it, so you have to select first and second. When you run AssociativeBasics.cpp you’ll notice that the iterator approach involves no extra object creations or destructions at all. It’s not as easy to write or read, though.

If you use a map with large, complex objects and discover there’s too much overhead when doing lookups and insertions (don’t assume this from the beginning – take the easy approach first and use a profiler to discover bottlenecks), then you can use the counted-handle approach shown in Chapter XX so that you are only passing around small, lightweight objects.

Of course, you can also iterate through a set or map and operate on each of its objects. This will be demonstrated in later examples.

Generators and fillers
for associative containers

You’ve seen how useful the fill( ), fill_n( ), generate( ) and generate_n( ) function templates in <algorithm> have been for filling the sequential containers ( vector, list and deque) with data. However, these are implemented by using operator= to assign values into the sequential containers, and the way that you add objects to associative containers is with their respective insert( ) member functions. Thus the default “assignment” behavior causes a problem when trying to use the “fill” and “generate” functions with associative containers.

One solution is to duplicate the “fill” and “generate” functions, creating new ones that can be used with associative containers. It turns out that only the fill_n( ) and generate_n( ) functions can be duplicated ( fill( ) and generate( ) copy in between two iterators, which doesn’t make sense with associative containers), but the job is fairly easy, since you have the <algorithm> header file to work from (and since it contains templates, all the source code is there):

//: C20:assocGen.h
// The fill_n() and generate_n() equivalents 
// for associative containers.
#ifndef ASSOCGEN_H
#define ASSOCGEN_H

template<class Assoc, class Count, class T>
void 
assocFill_n(Assoc& a, Count n, const T& val) {
  for (; 0 < n; --n)
    a.insert(val);
}

template<class Assoc, class Count, class Gen>
void assocGen_n(Assoc& a, Count n, Gen g) {
  for (; 0 < n; --n)
    a.insert(g());
}

#endif // ASSOCGEN_H ///:~

You can see that instead of using iterators, the container class itself is passed (by reference, of course, since you wouldn’t want to make a local copy, fill it, and then have it discarded at the end of the scope).

This code demonstrates two valuable lessons. The first lesson is that if the algorithms don’t do what you want, copy the nearest thing and modify it. You have the example at hand in the STL header, so most of the work has already been done.

The second lesson is more pointed: if you look long enough, there’s probably a way to do it in the STL without inventing anything new. The present problem can instead be solved by using an insert_iterator (produced by a call to inserter( )), which calls insert( ) to place items in the container instead of operator=. This is not simply a variation of front_insert_iterator (produced by a call to front_inserter( )) or back_insert_iterator (produced by a call to back_inserter( )), since those iterators use push_front( ) and push_back( ), respectively. Each of the insert iterators is different by virtue of the member function it uses for insertion, and insert( ) is the one we need. Here’s a demonstration that shows filling and generating both a map and a set (of course, it can also be used with multimap and multiset). First, some templatized, simple generators are created (this may seem like overkill, but you never know when you’ll need them; for that reason they’re placed in a header file):

//: C20:SimpleGenerators.h
// Generic generators, including
// one that creates pairs
#include <iostream>
#include <utility>

// A generator that increments its value:
template<typename T>
class IncrGen {
  T i;
public:
  IncrGen(T ii) : i (ii) {}
  T operator()() { return i++; }
};

// A generator that produces an STL pair<>:
template<typename T1, typename T2>
class PairGen {
  T1 i;
  T2 j;
public:
  PairGen(T1 ii, T2 jj) : i(ii), j(jj) {}
  std::pair<T1,T2> operator()() { 
    return std::pair<T1,T2>(i++, j++); 
  }
};

// A generic global operator<< 
// for printing any STL pair<>:
template<typename Pair> std::ostream&
operator<<(std::ostream& os, const Pair& p) {
  return os << p.first << "\t" 
    << p.second << std::endl;

} ///:~

Both generators expect that T can be incremented, and they simply use operator++ to generate new values from whatever you used for initialization. PairGen creates an STL pair object as its return value, and that’s what can be placed into a map or multimap using insert( ).

The last function is a generalization of operator<< for ostreams, so that any pair can be printed, assuming each element of the pair supports a stream operator<<. As you can see below, this allows the use of copy( ) to output the map:

//: C20:AssocInserter.cpp
// Using an insert_iterator so fill_n() and
// generate_n() can be used with associative 
// containers
#include "SimpleGenerators.h"
#include <iterator>
#include <iostream>
#include <algorithm>
#include <set>
#include <map>
using namespace std;

int main() {
  set<int> s;
  fill_n(inserter(s, s.begin()), 10, 47);
  generate_n(inserter(s, s.begin()), 10, 
    IncrGen<int>(12));
  copy(s.begin(), s.end(), 
    ostream_iterator<int>(cout, "\n"));
  
  map<int, int> m;
  fill_n(inserter(m, m.begin()), 10, 
    make_pair(90,120));
  generate_n(inserter(m, m.begin()), 10, 
    PairGen<int, int>(3, 9));
  copy(m.begin(), m.end(), 
    ostream_iterator<pair<int,int> >(cout,"\n"));

} ///:~

The second argument to inserter is an interator, which actually isn’t used in the case of associative containers since they maintain their order internally, rather than allowing you to tell them where the element should be inserted. However, an insert_iterator can be used with many different types of containers so you must provide the iterator.

Note how the ostream_iterator is created to output a pair; this wouldn’t have worked if the operator<< hadn’t been created, and since it’s a template it is automatically instantiated for pair<int, int> .

The magic of maps

An ordinary array uses an integral value to index into a sequential set of elements of some type. A map is an associative array , which means you associate one object with another in an array-like fashion, but instead of selecting an array element with a number as you do with an ordinary array, you look it up with an object! The example which follows counts the words in a text file, so the index is the string object representing the word, and the value being looked up is the object that keeps count of the strings.

In a single-item container like a vector or list, there’s only one thing being held. But in a map, you’ve got two things: the key (what you look up by, as in mapname[key]) and the value that results from the lookup with the key. If you simply want to move through the entire map and list each key-value pair, you use an iterator, which when dereferenced produces a pair object containing both the key and the value. You access the members of a pair by selecting first or second.

This same philosophy of packaging two items together is also used to insert elements into the map, but the pair is created as part of the instantiated map and is called value_type, containing the key and the value. So one option for inserting a new element is to create a value_type object, loading it with the appropriate objects and then calling the insert( ) member function for the map. Instead, the following example makes use of the aforementioned special feature of map: if you’re trying to find an object by passing in a key to operator[ ] and that object doesn’t exist, operator[ ] will automatically insert a new key-value pair for you, using the default constructor for the value object. With that in mind, consider an implementation of a word counting program:

//: C20:WordCount.cpp
//{L} StreamTokenizer
// Count occurrences of words using a map
#include "StreamTokenizer.h"
#include "../require.h"
#include <string>
#include <map>
#include <iostream>
#include <fstream>
using namespace std;

class Count {
  int i;
public:
  Count() : i(0) {}
  void operator++(int) { i++; } // Post-increment  
  int& val() { return i; }
};

typedef map<string, Count> WordMap;
typedef WordMap::iterator WMIter;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  StreamTokenizer words(in);
  WordMap wordmap;
  string word;
  while((word = words.next()).size() != 0)
    wordmap[word]++;
  for(WMIter w = wordmap.begin(); 
      w != wordmap.end(); w++)
    cout << (*w).first << ": "
      << (*w).second.val() << endl;

} ///:~

The need for the Count class is to contain an int that’s automatically initialized to zero. This is necessary because of the crucial line:

wordmap[string(word)]++;

This finds the word that has been produced by StreamTokenizer and increments the Count object associated with that word, which is fine as long as there is a key-value pair for that string. If there isn’t, the map automatically inserts a key for the word you’re looking up, and a Count object, which is initialized to zero by the default constructor. Thus, when it’s incremented the Count becomes 1.

Printing the entire list requires traversing it with an iterator (there’s no copy( ) shortcut for a map unless you want to write an operator<< for the pair in the map). As previously mentioned, dereferencing this iterator produces a pair object, with the first member the key and the second member the value. In this case second is a Count object, so its val( ) member must be called to produce the actual word count.

If you want to find the count for a particular word, you can use the array index operator, like this:

cout << "the: " << wordmap["the"].val() << endl;

You can see that one of the great advantages of the map is the clarity of the syntax; an associative array makes intuitive sense to the reader (note, however, that if “the” isn’t already in the wordmap a new entry will be created!).

A command-line argument tool

A problem that often comes up in programming is the management of program arguments that you can specify on the command line. Usually you’d like to have a set of defaults that can be changed via the command line. The following tool expects the command line arguments to be in the form flag1=value1 with no spaces around the ‘ =‘ (so it will be treated as a single argument). The ProgVal class simply inherits from map<string, string> :

//: C20:ProgVals.h
// Program values can be changed by command line
#ifndef PROGVALS_H
#define PROGVALS_H
#include <map>
#include <iostream>
#include <string>

class ProgVals 
  : public std::map<std::string, std::string> {
public:
  ProgVals(std::string defaults[][2], int sz);
  void parse(int argc, char* argv[],
    std::string usage, int offset = 1);
  void print(std::ostream& out = std::cout);
};

#endif // PROGVALS_H ///:~

The constructor expects an array of string pairs (as you’ll see, this allows you to initialize it with an array of char*) and the size of that array. The parse( ) member function is handed the command-line arguments along with a “usage” string to print if the command line is given incorrectly, and the “offset” which tells it which command-line argument to start with (so you can have non-flag arguments at the beginning of the command line). Finally, print( ) displays the values. Here is the implementation:

//: C20:ProgVals.cpp {O}
#include "ProgVals.h"
using namespace std;

ProgVals::ProgVals(
  std::string defaults[][2], int sz) {
  for(int i = 0; i < sz; i++)
    insert(make_pair(
      defaults[i][0], defaults[i][1]));
}

void ProgVals::parse(int argc, char* argv[],
  string usage, int offset) {
  // Parse and apply additional
  // command-line arguments:
  for(int i = offset; i < argc; i++) {
    string flag(argv[i]);
    int equal = flag.find('=');
    if(equal == string::npos) {
      cerr << "Command line error: " <<
        argv[i] << endl << usage << endl;
      continue; // Next argument
    }
    string name = flag.substr(0, equal);
    string value = flag.substr(equal + 1);
    if(find(name) == end()) {
      cerr << name << endl << usage << endl;
      continue; // Next argument
    }
    operator[](name) = value;
  }
}

void ProgVals::print(ostream& out) {
  out << "Program values:" << endl;
  for(iterator it = begin(); it != end(); it++)
    out << (*it).first << " = "
        << (*it).second << endl;

} ///:~

The constructor uses the STL make_pair( ) helper function to convert each pair of char* into a pair object that can be inserted into the map. In parse( ), each command-line argument is checked for the existence of the telltale ‘ =‘ sign (reporting an error if it isn’t there), and then is broken into two strings, the name which appears before the ‘ =‘, and the value which appears after. The operator[ ] is then used to change the existing value to the new one.

Here’s an example to test the tool:

//: C20:ProgValTest.cpp
//{L} ProgVals
#include "ProgVals.h"
using namespace std;

string defaults[][2] = {
  { "color", "red" },
  { "size", "medium" },
  { "shape", "rectangular" },
  { "action", "hopping"},
};

const char* usage = "usage:\n"
"ProgValTest [flag1=val1 flag2=val2 ...]\n"
"(Note no space around '=')\n"
"Where the flags can be any of: \n"
"color, size, shape, action \n";

// So it can be used globally:
ProgVals pvals(defaults, 
  sizeof defaults / sizeof *defaults);

class Animal {
  string color, size, shape, action;
public:
  Animal(string col, string sz, 
    string shp, string act) 
    :color(col),size(sz),shape(shp),action(act){}
  // Default constructor uses program default
  // values, possibly change on command line:
  Animal() : color(pvals["color"]), 
    size(pvals["size"]), shape(pvals["shape"]),
    action(pvals["action"]) {}
  void print() {
    cout << "color = " << color << endl
      << "size = " << size << endl
      << "shape = " << shape << endl
      << "action = " << action << endl;
  }
  // And of course pvals can be used anywhere
  // else you'd like.
};

int main(int argc, char* argv[]) {
  // Initialize and parse command line values
  // before any code that uses pvals is called:
  pvals.parse(argc, argv, usage);
  pvals.print();
  Animal a;
  cout << "Animal a values:" << endl;
  a.print();

} ///:~

This program can create Animal objects with different characteristics, and those characteristics can be established with the command line. The default characteristics are given in the two-dimensional array of char* called defaults and, after the usage string you can see a global instance of ProgVals called pvals is created; this is important because it allows the rest of the code in the program to access the values.

Note that Animal’s default constructor uses the values in pvals inside its constructor initializer list. When you run the program you can try creating different animal characteristics.

Many command-line programs also use a style of beginning a flag with a hyphen, and sometimes they use single-character flags.

The STL map is used in numerous places throughout the rest of this book.

Multimaps and duplicate keys

A multimap is a map that can contain duplicate keys. At first this may seem like a strange idea, but it can occur surprisingly often. A phone book, for example, can have many entries with the same name.

Suppose you are monitoring wildlife, and you want to keep track of where and when each type of animal is spotted. Thus, you may see many animals of the same kind, all in different locations and at different times. So if the type of animal is the key, you’ll need a multimap. Here’s what it looks like:

//: C20:WildLifeMonitor.cpp
#include <vector>
#include <map>
#include <string>
#include <algorithm>
#include <iostream>
#include <sstream>
#include <ctime>
using namespace std;

class DataPoint {
  int x, y; // Location coordinates
  time_t time; // Time of Sighting
public:
  DataPoint() : x(0), y(0), time(0) {}
  DataPoint(int xx, int yy, time_t tm) :
    x(xx), y(yy), time(tm) {}
  // Synthesized operator=, copy-constructor OK
  int getX() { return x; }
  int getY() { return y; }
  time_t* getTime() { return &time; }
};

string animal[] = { 
  "chipmunk", "beaver", "marmot", "weasel",
  "squirrel", "ptarmigan", "bear", "eagle",
  "hawk", "vole", "deer", "otter", "hummingbird",
};
const int asz = sizeof animal/sizeof *animal;
vector<string> animals(animal, animal + asz);

// All the information is contained in a 
// "Sighting," which can be sent to an ostream:
typedef pair<string, DataPoint> Sighting;

ostream& 
operator<<(ostream& os, const Sighting& s) {
  return os << s.first << " sighted at x= " << 
    s.second.getX() << ", y= " << s.second.getY()
    << ", time = " << ctime(s.second.getTime());
}

// A generator for Sightings:
class SightingGen {
  vector<string>& animals;
  static const int d = 100;
public:
  SightingGen(vector<string>& an) :
    animals(an) { srand(time(0)); }
  Sighting operator()() {
    Sighting result;
    int select = rand() % animals.size();
    result.first = animals[select];
    result.second = DataPoint(
      rand() % d, rand() % d, time(0));
    return result;
  }
};

typedef multimap<string, DataPoint> DataMap;
typedef DataMap::iterator DMIter;

int main() {
  DataMap sightings;
  generate_n(
    inserter(sightings, sightings.begin()),
    50, SightingGen(animals));
  // Print everything:
  copy(sightings.begin(), sightings.end(),
    ostream_iterator<Sighting>(cout, ""));
  // Print sightings for selected animal:
  while(true) {
    cout << "select an animal or 'q' to quit: ";
    for(int i = 0; i < animals.size(); i++)
      cout <<'['<< i <<']'<< animals[i] << ' ';
    cout << endl;
    string reply;
    cin >> reply;
    if(reply.at(0) == 'q') return 0;
    istringstream r(reply);
    int i;
    r >> i; // Converts to int
    i %= animals.size();
    // Iterators in "range" denote begin, one 
    // past end of matching range:
    pair<DMIter, DMIter> range = 
      sightings.equal_range(animals[i]);
    copy(range.first, range.second,
      ostream_iterator<Sighting>(cout, ""));
  }

} ///:~

All the data about a sighting is encapsulated into the class DataPoint, which is simple enough that it can rely on the synthesized assignment and copy-constructor. It uses the Standard C library time functions to record the time of the sighting.

In the array of string animal, notice that the char* constructor is automatically used during initialization, which makes initializing an array of string quite convenient. Since it’s easier to use the animal names in a vector, the length of the array is calculated and a vector<string> is initialized using the vector(iterator, iterator) constructor.

The key-value pairs that make up a Sighting are the string which names the type of animal, and the DataPoint that says where and when it was sighted. The standard pair template combines these two types and is typedefed to produce the Sighting type. Then an ostream operator<< is created for Sighting; this will allow you to iterate through a map or multimap of Sightings and print it out.

SightingGen generates random sightings at random data points to use for testing. It has the usual operator( ) necessary for a function object, but it also has a constructor to capture and store a reference to a vector<string>, which is where the aforementioned animal names are stored.

A DataMap is a multimap of string-DataPoint pairs, which means it stores Sightings. It is filled with 50 Sightings using generate_n( ), and printed out (notice that because there is an operator<< that takes a Sighting, an ostream_iterator can be created). At this point the user is asked to select the animal that they want to see all the sightings for. If you press ‘q’ the program will quit, but if you select an animal number, then the equal_range( ) member function is invoked. This returns an iterator ( DMIter) to the beginning of the set of matching pairs, and one indicating past-the-end of the set. Since only one object can be returned from a function, equal_range( ) makes use of pair. Since the range pair has the beginning and ending iterators of the matching set, those iterators can be used in copy( ) to print out all the sightings for a particular type of animal.

Multisets

You’ve seen the set, which only allows one object of each value to be inserted. The multiset is odd by comparison since it allows more than one object of each value to be inserted. This seems to go against the whole idea of “setness,” where you can ask “is ‘it’ in this set?” If there can be more than one of ‘it’, then what does that question mean?

With some thought, you can see that it makes no sense to have more than one object of the same value in a set if those duplicate objects are exactly the same (with the possible exception of counting occurrences of objects, but as seen earlier in this chapter that can be handled in an alternative, more elegant fashion). Thus each duplicate object will have something that makes it unique from the other duplicates – most likely different state information that is not used in the calculation of the value during the comparison. That is, to the comparison operation, the objects look the same but they actually contain some differing internal state.

Like any STL container that must order its elements, the multiset template uses the less template by default to determine element ordering. This uses the contained classes’ operator<, but you may of course substitute your own comparison function.

Consider a simple class that contains one element that is used in the comparison, and another that is not:

//: C20:MultiSet1.cpp
// Demonstration of multiset behavior
#include <iostream>
#include <set>
#include <algorithm>
#include <ctime>
using namespace std;

class X {
  char c; // Used in comparison
  int i; // Not used in comparison
  // Don't need default constructor and operator=
  X();
  X& operator=(const X&);
  // Usually need a copy-constructor (but the
  // synthesized version works here)
public:
  X(char cc, int ii) : c(cc), i(ii) {}
  // Notice no operator== is required
  friend bool operator<(const X& x, const X& y) {
    return x.c < y.c;
  }
  friend ostream& operator<<(ostream& os, X x) {
    return os << x.c << ":" << x.i;
  }
};

class Xgen {
  static int i;
  // Number of characters to select from:
  static const int span = 6;
public:
  Xgen() { srand(time(0)); }
  X operator()() {
    char c = 'A' + rand() % span;    
    return X(c, i++);
  }
};

int Xgen::i = 0;

typedef multiset<X> Xmset;
typedef Xmset::const_iterator Xmit;

int main() {
  Xmset mset;
  // Fill it with X's:
  generate_n(inserter(mset, mset.begin()), 
    25, Xgen());
  // Initialize a regular set from mset:
  set<X> unique(mset.begin(), mset.end());
  copy(unique.begin(), unique.end(), 
    ostream_iterator<X>(cout, " "));
  cout << "\n----\n";
  // Iterate over the unique values:
  for(set<X>::iterator i = unique.begin();
      i != unique.end(); i++) {
    pair<Xmit, Xmit> p = mset.equal_range(*i);
    copy(p.first, p.second, 
      ostream_iterator<X>(cout, " "));
    cout << endl;
  }

} ///:~

In X, all the comparisons are made with the char c . The comparison is performed with operator<, which is all that is necessary for the multiset, since in this example the default less comparison object is used. The class Xgen is used to randomly generate X objects, but the comparison value is restricted to the span from ‘A’ to ‘ E’. In main( ), a multiset<X> is created and filled with 25 X objects using Xgen, guaranteeing that there will be duplicate keys. So that we know what the unique values are, a regular set<X> is created from the multiset (using the iterator, iterator constructor). These values are displayed, then each one is used to produce the equal_range( ) in the multiset ( equal_range( ) has the same meaning here as it does with multimap: all the elements with matching keys). Each set of matching keys is then printed.

As a second example, a (possibly) more elegant version of WordCount.cpp can be created using multiset:

//: C20:MultiSetWordCount.cpp
//{L} StreamTokenizer
// Count occurrences of words using a multiset
#include "StreamTokenizer.h"
#include "../require.h"
#include <string>
#include <set>
#include <fstream>
#include <iterator>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  StreamTokenizer words(in);
  multiset<string> wordmset;
  string word;
  while((word = words.next()).size() != 0)
    wordmset.insert(word);
  typedef multiset<string>::iterator MSit;
  MSit it = wordmset.begin();
  while(it != wordmset.end()) {
    pair<MSit, MSit> p=wordmset.equal_range(*it);
    int count = distance(p.first, p.second);
    cout << *it << ": " << count << endl;
    it = p.second; // Move to the next word
  }

} ///:~

The setup in main( ) is identical to WordCount.cpp, but then each word is simply inserted into the multiset<string>. An iterator is created and initialized to the beginning of the multiset; dereferencing this iterator produces the current word. equal_range( ) produces the starting and ending iterators of the word that’s currently selected, and the STL algorithm distance( ) (which is in <iterator>) is used to count the number of elements in that range. Then the iterator it is moved forward to the end of the range, which puts it at the next word. Although if you’re unfamiliar with the multiset this code can seem more complex, the density of it and the lack of need for supporting classes like Count has a lot of appeal.

In the end, is this really a “set,” or should it be called something else? An alternative is the generic “bag” that has been defined in some container libraries, since a bag holds anything at all without discrimination – including duplicate objects. This is close, but it doesn’t quite fit since a bag has no specification about how elements should be ordered, while a multiset (which requires that all duplicate elements be adjacent to each other) is even more restrictive than the concept of a set, which could use a hashing function to order its elements, in which case they would not be in sorted order. Besides, if you wanted to store a bunch of objects without any special criterions, you’d probably just use a vector, deque or list.

Contents | Prev | Next

Contact: webmaster@codeguru.com
CodeGuru - the website for developers. [an error occurred while processing this directive]

Associative containers

Generators and fillers for associative containers

The magic of maps

A command-line argument tool

Multimaps and duplicate keys

Multisets

Generators and fillers
for associative containers