CodeGuru : Thinking in C++

A catalog of STL algorithms

This section provides a quick reference for when you’re searching for the appropriate algorithm. I leave the full exploration of all the STL algorithms to other references (see the end of this chapter, and Appendix XX), along with the more intimate details of complexity, performance, etc. My goal here is for you to become rapidly comfortable and facile with the algorithms, and I will assume you will look into the more specialized references if you need more depth of detail.

Although you will often see the algorithms described using their full template declaration syntax, I am not doing that here because you already know they are templates, and it’s quite easy to see what the template arguments are from the function declarations. The type names for the arguments provide descriptions for the types of iterators required. I think you’ll find this form is easier to read, while you can quickly find the full declaration in the template header file if for some reason you feel the need.

The names of the iterator classes describe the iterator type they must conform to. The iterator types were described in the previous chapter, but here is a summary:

InputIterator. You (or rather, the STL algorithm and any algorithms you write that use InputIterators) can increment this with operator++ and dereference it with operator* to read the value (and only read the value), but you can only read each value once. InputIterators can be tested with operator== and operator!=. That’s all. Because an InputIterator is so limited, it can be used with istreams (via istream_iterator).

OutputIterator. This can be incremented with operator++, and dereferenced with operator* to write the value (and only write the value), but you can only dereference/write each value once. OutputIterators cannot be tested with operator== and operator!=, however, because you assume that you can just keep sending elements to the destination and that you don’t have to see if the destination’s end marker has been reached. That is, the container that an OutputIterator references can take an infinite number of objects, so no end-checking is necessary. This requirement is important so that an OutputIterator can be used with ostreams (via ostream_iterator), but you’ll also commonly use the “insert” iterators insert_iterator, front_insert_iterator and back_insert_iterator (generated by the helper templates inserter( ), front_inserter( ) and back_inserter( )).

With both InputIterator and OutputIterator, you cannot have multiple iterators pointing to different parts of the same range. Just think in terms of iterators to support istreams and ostreams, and InputIterator and OutputIterator will make perfect sense. Also note that InputIterator and OutputIterator put the weakest restrictions on the types of iterators they will accept, which means that you can use any “more sophisticated” type of iterator when you see InputIterator or OutputIterator used as STL algorithm template arguments.

ForwardIterator. InputIterator and OutputIterator are the most restricted, which means they’ll work with the largest number of actual iterators. However, there are some operations for which they are too restricted; you can only read from an InputIterator and write to an OutputIterator, so you can’t use them to read and modify a range, for example, and you can’t have more than one active iterator on a particular range, or dereference such an iterator more than once. With a ForwardIterator these restrictions are relaxed; you can still only move forward using operator++, but you can both write and read and you can write/read multiple times in each location. A ForwardIterator is much more like a regular pointer, whereas InputIterator and OutputIterator are a bit strange by comparison.

BidirectionalIterator. Effectively, this is a ForwardIterator that can also go backward. That is, a BidirectionalIterator supports all the operations that a ForwardIterator does, but in addition it has an operator--.

RandomAccessIterator. An iterator that is random access supports all the same operations that a regular pointer does: you can add and subtract integral values to move it forward and backward by jumps (rather than just one element at a time), you can subscript it with operator[ ], you can subtract one iterator from another, and iterators can be compared to see which is greater using operator<, operator>, etc. If you’re implementing a sorting routine or something similar, random access iterators are necessary to be able to create an efficient algorithm.

The names used for the template parameter types consist of the above iterator types (sometimes with a ‘1’ or ‘2’ appended to distinguish different template arguments), and may also include other arguments, often function objects.

When describing the group of elements that an operation is performed on, mathematical “range” notation will often be used. In this, the square bracket means “includes the end point” while the parenthesis means “does not include the end point.” When using iterators, a range is determined by the iterator pointing to the initial element, and the “past-the-end” iterator, pointing past the last element. Since the past-the-end element is never used, the range determined by a pair of iterators can thus be expressed as [first, last) , where first is the iterator pointing to the initial element and last is the past-the-end iterator.

Most books and discussions of the STL algorithms arrange them according to side effects: nonmutating algorithms don’t change the elements in the range, mutating algorithms do change the elements, etc. These descriptions are based more on the underlying behavior or implementation of the algorithm – that is, the designer’s perspective. In practice, I don’t find this a very useful categorization so I shall instead organize them according to the problem you want to solve: are you searching for an element or set of elements, performing an operation on each element, counting elements, replacing elements, etc. This should help you find the one you want more easily.

Note that all the algorithms are in the namespace std . If you do not see a different header such as <utility> or <numerics> above the function declarations, that means it appears in <algorithm>.

Support tools for example creation

It’s useful to create some basic tools with which to test the algorithms.

Displaying a range is something that will be done constantly, so here is a templatized function that allows you to print any sequence, regardless of the type that’s in that sequence:

//: C21:PrintSequence.h
// Prints the contents of any sequence
#ifndef PRINTSEQUENCE_H
#define PRINTSEQUENCE_H
#include <iostream>

template<typename InputIter>
void print(InputIter first, InputIter last,
  char* nm = "", char* sep = "\n", 
  std::ostream& os = std::cout) { 
  if(*nm != '\0') // Only if you provide a string
    os << nm << ": " << sep; // is this printed
  while(first != last)
    os << *first++ << sep;
  os << std::endl;
}

// Use template-templates to allow type deduction
// of the typename T:
template<typename T, template<typename> class C>
void print(C<T>& c, char* nm = "", 
  char* sep = "\n", 
  std::ostream& os = std::cout) {
  if(*nm != '\0') // Only if you provide a string
    os << nm << ": " << sep; // is this printed
  std::copy(c.begin(), c.end(), 
    std::ostream_iterator<T>(os, " "));
  cout << endl;
}

#endif // PRINTSEQUENCE_H ///:~

There are two forms here, one that requires you to give an explicit range (this allows you to print an array or a sub-sequence) and one that prints any of the STL containers, which provides notational convenience when printing the entire contents of that container. The second form performs template type deduction to determine the type of T so it can be used in the copy( ) algorithm. That trick wouldn’t work with the first form, so the copy( ) algorithm is avoided and the copying is just done by hand (this could have been done with the second form as well, but it’s instructive to see a template-template in use). Because of this, you never need to specify the type that you’re printing when you call either template function.

The default is to print to cout with newlines as separators, but you can change that. You may also provide a message to print at the head of the output.

Next, it’s useful to have some generators (classes with an operator( ) that returns values of the appropriate type) which allow a sequence to be rapidly filled with different values.

//: C21:Generators.h
// Different ways to fill sequences
#ifndef GENERATORS_H
#define GENERATORS_H
#include <set>
#include <cstdlib>
#include <cstring>
#include <ctime>

// A generator that can skip over numbers:
class SkipGen {
  int i;
  int skp;
public:
  SkipGen(int start = 0, int skip = 1)
    : i(start), skp(skip) {}
  int operator()() {
    int r = i;
    i += skp;
    return r;
  }
};

// Generate unique random numbers from 0 to mod:
class URandGen {
  std::set<int> used;
  int modulus;
public:
  URandGen(int mod) : modulus(mod) { 
    std::srand(std::time(0)); 
  }
  int operator()() {
    while(true) {
      int i = (int)std::rand() % modulus;
      if(used.find(i) == used.end()) {
        used.insert(i);
        return i;
      }
    }
  }
};

// Produces random characters:
class CharGen {
  static const char* source;
  static const int len;
public:
  CharGen() { std::srand(std::time(0)); }
  char operator()() { 
    return source[std::rand() % len];
  }
};

// Statics created here for convenience, but
// will cause problems if multiply included:
const char* CharGen::source = "ABCDEFGHIJK"
  "LMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
const int CharGen::len = std::strlen(source);

#endif // GENERATORS_H ///:~

To create some interesting values, the SkipGen generator skips by the value skp each time its operator( ) is called. You can initialize both the start value and the skip value in the constructor.

URandGen (‘U’ for “unique”) is a generator for random ints between 0 and mod, with the additional constraint that each value can only be produced once (thus you must be careful not to use up all the values). This is easily accomplished with a set.

CharGen generates chars and can be used to fill up a string (when treating a string as a sequence container). You’ll note that the one member function that any generator implements is operator( ) (with no arguments). This is what is called by the “generate” functions.

The use of the generators and the print( ) functions is shown in the following section.

Finally, a number of the STL algorithms that move elements of a sequence around distinguish between “stable” and “unstable” reordering of a sequence. This refers to preserving the original order of the elements for those elements that are equivalent but not identical. For example, consider a sequence { c(1), b(1), c(2), a(1), b(2), a(2) } . These elements are tested for equivalence based on their letters, but their numbers indicate how they first appeared in the sequence. If you sort (for example) this sequence using an unstable sort, there’s no guarantee of any particular order among equivalent letters, so you could end up with { a(2), a(1), b(1), b(2), c(2), c(1) } . However, if you used a stable sort, it guarantees you will get { a(1), a(2), b(1), b(2), c(1), c(2) } .

To demonstrate the stability versus instability of algorithms that reorder a sequence, we need some way to keep track of how the elements originally appeared. The following is a kind of string object that keeps track of the order in which that particular object originally appeared, using a static map that maps NStrings to Counters. Each NString then contains an occurrence field that indicates the order in which this NString was discovered:

//: C21:NString.h
// A "numbered string" that indicates which
// occurrence this is of a particular word
#ifndef NSTRING_H
#define NSTRING_H
#include <string>
#include <map>
#include <iostream>

class NString {
  std::string s;
  int occurrence;
  struct Counter {
    int i;
    Counter() : i(0) {}
    Counter& operator++(int) { 
      i++;
      return *this;
    } // Post-incr
    operator int() { return i; }
  };
  // Keep track of the number of occurrences:
  typedef std::map<std::string, Counter> csmap;
  static csmap occurMap;
public:
  NString() : occurrence(0) {}
  NString(const std::string& x) 
    : s(x), occurrence(occurMap[s]++) {}
  NString(const char* x) 
    : s(x), occurrence(occurMap[s]++) {}
  // The synthesized operator= and 
  // copy-constructor are OK here
  friend std::ostream& operator<<(
    std::ostream& os, const NString& ns) {
    return os << ns.s << " [" 
      << ns.occurrence << "]";
  }
  // Need this for sorting. Notice it only 
  // compares strings, not occurrences:
  friend bool 
  operator<(const NString& l, const NString& r) {
    return l.s < r.s;
  }
  // For sorting with greater<NString>:
  friend bool 
  operator>(const NString& l, const NString& r) {
    return l.s > r.s;
  }
  // To get at the string directly:
  operator const std::string&() const {return s;}
};

// Allocate static member object. Done here for
// brevity, but should actually be done in a 
// separate cpp file:
NString::csmap NString::occurMap;

#endif // NSTRING_H ///:~

In the constructors (one that takes a string, one that takes a char*), the simple-looking initialization occurrence(occurMap[s]++) performs all the work of maintaining and assigning the occurrence counts (see the demonstration of the map class in the previous chapter for more details).

To do an ordinary ascending sort, the only operator that’s necessary is NString::operator<( ), however to sort in reverse order the operator>( ) is also provided so that the greater template can be used.

As this is just a demonstration class I am getting away with the convenience of putting the definition of the static member occurMap in the header file, but this will break down if the header file is included in more than one place, so you should normally relegate all static definitions to cpp files.

Filling & generating

These algorithms allow you to automatically fill a range with a particular value, or to generate a set of values for a particular range (these were introduced in the previous chapter). The “fill” functions insert a single value multiple times into the container, while the “generate” functions use an object called a generator (described earlier) to create the values to insert into the container.

void fill(ForwardIterator first, ForwardIterator last, const T& value);

void fill_n(OutputIterator first, Size n, const T& value);

fill( ) assigns value to every element in the range [first, last) . fill_n( ) assigns value to n elements starting at first.

void generate(ForwardIterator first, ForwardIterator last, Generator gen);

void generate_n(OutputIterator first, Size n, Generator gen);

generate( ) makes a call to gen( ) for each element in the range [first, last) , presumably to produce a different value for each element. generate_n( ) calls gen( ) n times and assigns each result to n elements starting at first.

Example

The following example fills and generates into vectors. It also shows the use of print( ):

//: C21:FillGenerateTest.cpp
// Demonstrates "fill" and "generate"
#include "Generators.h"
#include "PrintSequence.h"
#include <vector>
#include <algorithm>
#include <string>
using namespace std;

int main() {
  vector<string> v1(5);
  fill(v1.begin(), v1.end(), "howdy");
  print(v1, "v1", " ");
  vector<string> v2;
  fill_n(back_inserter(v2), 7, "bye");
  print(v2.begin(), v2.end(), "v2");
  vector<int> v3(10);
  generate(v3.begin(), v3.end(), SkipGen(4,5));
  print(v3, "v3", " ");
  vector<int> v4;
  generate_n(back_inserter(v4),15, URandGen(30));
  print(v4, "v4", " ");

} ///:~

A vector<string> is created with a pre-defined size. Since storage has already been created for all the string objects in the vector, fill( ) can use its assignment operator to assign a copy of “howdy” to each space in the vector. To print the result, the second form of print( ) is used which simply needs a container (you don’t have to give the first and last iterators). Also, the default newline separator is replaced with a space.

The second vector<string> v2 is not given an initial size so back_inserter must be used to force new elements in instead of trying to assign to existing locations. Just as an example, the other print( ) is used which requires a range.

The generate( ) and generate_n( ) functions have the same form as the “fill” functions except that they use a generator instead of a constant value; here, both generators are demonstrated.

Counting

All containers have a method size( ) that will tell you how many elements they hold. The following two algorithms count objects only if they satisfy certain criteria.

IntegralValue count(InputIterator first, InputIterator last,

const EqualityComparable& value);

Produces the number of elements in [first, last) that are equivalent to value (when tested using operator==).

IntegralValue count_if(InputIterator first, InputIterator last, Predicate pred);

Produces the number of elements in [first, last) which each cause pred to return true.

Example

Here, a vector<char> v is filled with random characters (including some duplicates). A set<char> is initialized from v, so it holds only one of each letter represented in v. This set is used to count all the instances of all the different characters, which are then displayed:

//: C21:Counting.cpp
// The counting algorithms
#include "PrintSequence.h"
#include "Generators.h"
#include <vector>
#include <algorithm>
using namespace std;

int main() {
  vector<char> v;
  generate_n(back_inserter(v), 50, CharGen());
  print(v, "v", "");
  // Create a set of the characters in v:
  set<char> cs(v.begin(), v.end());
  set<char>::iterator it = cs.begin();
  while(it != cs.end()) {
    int n = count(v.begin(), v.end(), *it);
    cout << *it << ": " << n << ", ";
    it++;
  }
  int lc = count_if(v.begin(), v.end(), 
    bind2nd(greater<char>(), 'a'));
  cout << "\nLowercase letters: " << lc << endl;
  sort(v.begin(), v.end());
  print(v, "sorted", "");

} ///:~

The count_if( ) algorithm is demonstrated by counting all the lowercase letters; the predicate is created using the bind2nd( ) and greater function object templates.

Manipulating sequences

These algorithms allow you to move sequences around.

OutputIterator copy(InputIterator, first InputIterator last, OutputIterator destination);

Using assignment, copies from [first, last) to destination, incrementing destination after each assignment. Works with almost any type of source range and almost any kind of destination. Because assignment is used, you cannot directly insert elements into an empty container or at the end of a container, but instead you must wrap the destination iterator in an insert_iterator (typically by using back_inserter( ), or inserter( ) in the case of an associative container).

The copy algorithm is used in many examples in this book.

BidirectionalIterator2 copy_backward(BidirectionalIterator1 first,

BidirectionalIterator1 last, BidirectionalIterator2 destinationEnd);

Like copy( ), but performs the actual copying of the elements in reverse order. That is, the resulting sequence is the same, it’s just that the copy happens in a different way. The source range [first, last) is copied to the destination, but the first destination element is destinationEnd - 1 . This iterator is then decremented after each assignment. The space in the destination range must already exist (to allow assignment), and the destination range cannot be within the source range.

void reverse(BidirectionalIterator first, BidirectionalIterator last);

OutputIterator reverse_copy(BidirectionalIterator first, BidirectionalIterator last,

OutputIterator destination);

Both forms of this function reverse the range [first, last) . reverse( ) reverses the range in place, while reverse_copy( ) leaves the original range alone and copies the reversed elements into destination, returning the past-the-end iterator of the resulting range.

ForwardIterator2 swap_ranges(ForwardIterator1 first1, ForwardIterator1 last1,

ForwardIterator2 first2);

Exchanges the contents of two ranges of equal size, by moving from the beginning to the end of each range and swapping each set of elements.

void rotate(ForwardIterator first, ForwardIterator middle, ForwardIterator last);

OutputIterator rotate_copy(ForwardIterator first, ForwardIterator middle,

ForwardIterator last, OutputIterator destination);

Swaps the two ranges [first, middle) and [middle, last) . With rotate( ), the swap is performed in place, and with rotate_copy( ) the original range is untouched and the rotated version is copied into destination, returning the past-the-end iterator of the resulting range. Note that while swap_ranges( ) requires that the two ranges be exactly the same size, the “rotate” functions do not.

bool next_permutation(BidirectionalIterator first, BidirectionalIterator last);

bool next_permutation(BidirectionalIterator first, BidirectionalIterator last,

StrictWeakOrdering binary_pred);

bool prev_permutation(BidirectionalIterator first, BidirectionalIterator last);

bool prev_permutation(BidirectionalIterator first, BidirectionalIterator last,

StrictWeakOrdering binary_pred);

A permutation is one unique ordering of a set of elements. If you have n unique elements, then there are n! ( n factorial) distinct possible combinations of those elements. All these combinations can be conceptually sorted into a sequence using a lexicographical ordering, and thus produce a concept of a “next” and “previous” permutation. Therefore, whatever the current ordering of elements in the range, there is a distinct “next” and “previous” permutation in the sequence of permutations.

The next_permutation( ) and prev_permutation( ) functions re-arrange the elements into their next or previous permutation, and if successful return true. If there are no more “next” permutations, it means that the elements are in sorted order so next_permutation( ) returns false. If there are no more “previous” permutations, it means that the elements are in descending sorted order so previous_permutation( ) returns false.

The versions of the functions which have a StrictWeakOrdering argument perform the comparisons using binary_pred instead of operator<.

void random_shuffle(RandomAccessIterator first, RandomAccessIterator last);

void random_shuffle(RandomAccessIterator first, RandomAccessIterator last

RandomNumberGenerator& rand);

This function randomly rearranges the elements in the range. It yields uniformly distributed results. The first form uses an internal random number generator and the second uses a user-supplied random-number generator.

BidirectionalIterator partition(BidirectionalIterator first, BidirectionalIterator last,

Predicate pred);

BidirectionalIterator stable_partition(BidirectionalIterator first,

BidirectionalIterator last, Predicate pred);

The “partition” functions use pred to organize the elements in the range [first, last) so they are before or after the partition (a point in the range). The partition point is given by the returned iterator. If pred(*i) is true (where i is the iterator pointing to a particular element), then that element will be placed before the partition point, otherwise it will be placed after the partition point.

With partition( ), the order of the elements is after the function call is not specified, but with stable_parition( ) the relative order of the elements before and after the partition point will be the same as before the partitioning process.

Example

This gives a basic demonstration of sequence manipulation:

//: C21:Manipulations.cpp
// Shows basic manipulations
#include "PrintSequence.h"
#include "NString.h"
#include "Generators.h"
#include <vector>
#include <string>
#include <algorithm>
using namespace std;

int main() {
  vector<int> v1(10);
  // Simple counting:
  generate(v1.begin(), v1.end(), SkipGen());
  print(v1, "v1", " ");
  vector<int> v2(v1.size());
  copy_backward(v1.begin(), v1.end(), v2.end());
  print(v2, "copy_backward", " ");
  reverse_copy(v1.begin(), v1.end(), v2.begin());
  print(v2, "reverse_copy", " ");
  reverse(v1.begin(), v1.end());
  print(v1, "reverse", " ");
  int half = v1.size() / 2;
  // Ranges must be exactly the same size:
  swap_ranges(v1.begin(), v1.begin() + half,
    v1.begin() + half);
  print(v1, "swap_ranges", " ");
  // Start with fresh sequence:
  generate(v1.begin(), v1.end(), SkipGen());
  print(v1, "v1", " ");
  int third = v1.size() / 3;
  for(int i = 0; i < 10; i++) {
    rotate(v1.begin(), v1.begin() + third, 
      v1.end());
    print(v1, "rotate", " ");
  }
  cout << "Second rotate example:" << endl;
  char c[] = "aabbccddeeffgghhiijj";
  const char csz = strlen(c);
  for(int i = 0; i < 10; i++) {
    rotate(c, c + 2, c + csz);
    print(c, c + csz, "", "");
  }
  cout << "All n! permutations of abcd:" << endl;
  int nf = 4 * 3 * 2 * 1;
  char p[] = "abcd";
  for(int i = 0; i < nf; i++) {
    next_permutation(p, p + 4);
    print(p, p + 4, "", "");
  }
  cout << "Using prev_permutation:" << endl;
  for(int i = 0; i < nf; i++) {
    prev_permutation(p, p + 4);
    print(p, p + 4, "", "");
  }
  cout << "random_shuffling a word:" << endl;
  string s("hello");
  cout << s << endl;
  for(int i = 0; i < 5; i++) {
    random_shuffle(s.begin(), s.end());
    cout << s << endl;
  }
  NString sa[] = { "a", "b", "c", "d", "a", "b",
    "c", "d", "a", "b", "c", "d", "a", "b", "c"};
  const int sasz = sizeof sa / sizeof *sa;
  vector<NString> ns(sa, sa + sasz);
  print(ns, "ns", " ");
  vector<NString>::iterator it = 
    partition(ns.begin(), ns.end(), 
      bind2nd(greater<NString>(), "b"));
  cout << "Partition point: " << *it << endl;
  print(ns, "", " ");
  // Reload vector:
  copy (sa, sa + sasz, ns.begin());
  it = stable_partition(ns.begin(), ns.end(),
    bind2nd(greater<NString>(), "b"));
  cout << "Stable partition" << endl;
  cout << "Partition point: " << *it << endl;
  print(ns, "", " ");

} ///:~

The best way to see the results of the above program is to run it (you’ll probably want to redirect the output to a file).

The vector<int> v1 is initially loaded with a simple ascending sequence and printed. You’ll see that the effect of copy_backward( ) (which copies into v2, which is the same size as v1) is the same as an ordinary copy. Again, copy_backward( ) does the same thing as copy( ), it just performs the operations in backward order.

reverse_copy( ), however, actually does created a reversed copy, while reverse( ) performs the reversal in place. Next, swap_ranges( ) swaps the upper half of the reversed sequence with the lower half. Of course, the ranges could be smaller subsets of the entire vector, as long as they are of equivalent size.

After re-creating the ascending sequence, rotate( ) is demonstrated by rotating one third of v1 multiple times. A second rotate( ) example uses characters and just rotates two characters at a time. This also demonstrates the flexibility of both the STL algorithms and the print( ) template, since they can both be used with arrays of char as easily as with anything else.

To demonstrate next_permutation( ) and prev_permutation( ), a set of four characters “abcd” is permuted through all n! ( n factorial) possible combinations. You’ll see from the output that the permutations move through a strictly-defined order (that is, permuting is a deterministic process).

A quick-and-dirty demonstration of random_shuffle( ) is to apply it to a string and see what words result. Because a string object has begin( ) and end( ) member functions that return the appropriate iterators, it too may be easily used with many of the STL algorithms. Of course, an array of char could also have been used.

Finally, the partition( ) and stable_partition( ) are demonstrated, using an array of NString. You’ll note that the aggregate initialization expression uses char arrays, but NString has a char* constructor which is automatically used.

When partitioning a sequence, you need a predicate which will determine whether the object belongs above or below the partition point. This takes a single argument and returns true (the object is above the partition point) or false (it isn’t). I could have written a separate function or function object to do this, but for something simple, like “the object is greater than ‘b’”, why not use the built-in function object templates? The expression is:

bind2nd(greater<NString>(), "b")

And to understand it, you need to pick it apart from the middle outward. First,

greater<NString>()

produces a binary function object which compares its first and second arguments:

return first > second;

and returns a bool. But we don’t want a binary predicate, and we want to compare against the constant value “ b.” So bind2nd( ) says: create a new function object which only takes one argument, by taking this greater<NString>( ) function and forcing the second argument to always be “ b.” The first argument (the only argument) will be the one from the vector ns.

You’ll see from the output that with the unstable partition, the objects are correctly above and below the partition point, but in no particular order, whereas with the stable partition their original order is maintained.

Searching & replacing

All of these algorithms are used for searching for one or more objects within a range defined by the first two iterator arguments.

InputIterator find(InputIterator first, InputIterator last,

const EqualityComparable& value);

Searches for value within a range of elements. Returns an iterator in the range [first, last) that points to the first occurrence of value. If value isn’t in the range, then find( ) returns last. This is a linear search , that is, it starts at the beginning and looks at each sequential element without making any assumptions about the way the elements are ordered. In contrast, a binary_search( ) (defined later) works on a sorted sequence and can thus be much faster.

InputIterator find_if(InputIterator first, InputIterator last, Predicate pred);

Just like find( ), find_if( ) performs a linear search through the range. However, instead of searching for value, find_if( ) looks for an element such that the Predicate pred returns true when applied to that element. Returns last if no such element can be found.

ForwardIterator adjacent_find(ForwardIterator first, ForwardIterator last);

ForwardIterator adjacent_find(ForwardIterator first, ForwardIterator last,

BinaryPredicate binary_pred);

Like find( ), performs a linear search through the range, but instead of looking for only one element it searches for two elements that are right next to each other. The first form of the function looks for two elements that are equivalent (via operator==). The second form looks for two adjacent elements that, when passed together to binary_pred, produce a true result. If two adjacent elements cannot be found, last is returned.

ForwardIterator1 find_first_of(ForwardIterator1 first1, ForwardIterator1 last1,

ForwardIterator2 first2, ForwardIterator2 last2);

ForwardIterator1 find_first_of(ForwardIterator1 first1, ForwardIterator1 last1,

ForwardIterator2 first2, ForwardIterator2 last2, BinaryPredicate binary_pred);

Like find( ), performs a linear search through the range. The first form finds the first element in the first range that is equivalent to any of the elements in the second range. The second form finds the first element in the first range that produces true when passed to binary_pred along with any of the elements in the second range. When a BinaryPredicate is used with two ranges in the algorithms, the element from the first range becomes the first argument to binary_pred, and the element from the second range becomes the second argument.

ForwardIterator1 search(ForwardIterator1 first1, ForwardIterator1 last1,

ForwardIterator2 first2, ForwardIterator2 last2);

ForwardIterator1 search(ForwardIterator1 first1, ForwardIterator1 last1,

ForwardIterator2 first2, ForwardIterator2 last2 BinaryPredicate binary_pred);

Attempts to find the entire range [first2, last2) within the range [first1, last1) . That is, it checks to see if the second range occurs (in the exact order of the second range) within the first range, and if so returns an iterator pointing to the place in the first range where the second range begins. Returns last1 if no subset can be found. The first form performs its test using operator==, while the second checks to see if each pair of objects being compared causes binary_pred to return true.

ForwardIterator1 find_end(ForwardIterator1 first1, ForwardIterator1 last1,

ForwardIterator2 first2, ForwardIterator2 last2);

ForwardIterator1 find_end(ForwardIterator1 first1, ForwardIterator1 last1,

ForwardIterator2 first2, ForwardIterator2 last2, BinaryPredicate binary_pred);

The forms and arguments are just like search( ) in that it looks for the second range within the first range, but while search( ) looks for the first occurrence of the second range, find_end( ) looks for the last occurrence of the second range within the first.

ForwardIterator search_n(ForwardIterator first, ForwardIterator last,

Size count, const T& value);

ForwardIterator search_n(ForwardIterator first, ForwardIterator last,

Size count, const T& value, BinaryPredicate binary_pred);

Looks for a group of count consecutive values in [first, last) that are all equal to value (in the first form) or that all cause a return value of true when passed into binary_pred along with value (in the second form). Returns last if such a group cannot be found.

ForwardIterator min_element(ForwardIterator first, ForwardIterator last);

ForwardIterator min_element(ForwardIterator first, ForwardIterator last,

BinaryPredicate binary_pred);

Returns an iterator pointing to the first occurrence of the smallest value in the range (there may be multiple occurrences of the smallest value). Returns last if the range is empty. The first version performs comparisons with operator< and the value r returned is such that

*e < *r

is false for every element e in the range. The second version compares using binary_pred and the value r returned is such that binary_pred (*e, *r) is false for every element e in the range.

ForwardIterator max_element(ForwardIterator first, ForwardIterator last);

ForwardIterator max_element(ForwardIterator first, ForwardIterator last,

BinaryPredicate binary_pred);

Returns an iterator pointing to the first occurrence of the largest value in the range (there may be multiple occurrences of the largest value). Returns last if the range is empty. The first version performs comparisons with operator< and the value r returned is such that

*r < *e

is false for every element e in the range. The second version compares using binary_pred and the value r returned is such that binary_pred (*r, *e) is false for every element e in the range.

void replace(ForwardIterator first, ForwardIterator last,

const T& old_value, const T& new_value);

void replace_if(ForwardIterator first, ForwardIterator last,

Predicate pred, const T& new_value);

OutputIterator replace_copy(InputIterator first, InputIterator last,

OutputIterator result, const T& old_value, const T& new_value);

OutputIterator replace_copy_if(InputIterator first, InputIterator last,

OutputIterator result, Predicate pred, const T& new_value);

Each of the “replace” forms moves through the range [first, last) , finding values that match a criterion and replacing them with new_value. Both replace( ) and replace_copy( ) simply look for old_value to replace, while replace_if( ) and replace_copy_if( ) look for values that satisfy the predicate pred. The “copy” versions of the functions do not modify the original range but instead make a copy with the replacements into result (incrementing result after each assignment).

Example

To provide easy viewing of the results, this example will manipulate vectors of int. Again, not every possible version of each algorithm will be shown (some that should be obvious have been omitted).

//: C21:SearchReplace.cpp
// The STL search and replace algorithms
#include "PrintSequence.h"
#include <vector>
#include <algorithm>
#include <functional>
using namespace std;

struct PlusOne {
  bool operator()(int i, int j) {
    return j == i + 1;
  }
};

class MulMoreThan {
  int value;
public:
  MulMoreThan(int val) : value(val) {}
  bool operator()(int v, int m) {
    return v * m > value;
  }
};

int main() {
  int a[] = { 1, 2, 3, 4, 5, 6, 6, 7, 7, 7,
    8, 8, 8, 8, 11, 11, 11, 11, 11 };
  const int asz = sizeof a / sizeof *a;
  vector<int> v(a, a + asz);
  print(v, "v", " ");
  vector<int>::iterator it =
    find(v.begin(), v.end(), 4);
  cout << "find: " << *it << endl;
  it = find_if(v.begin(), v.end(), 
    bind2nd(greater<int>(), 8));
  cout << "find_if: " << *it << endl;
  it = adjacent_find(v.begin(), v.end());
  while(it != v.end()) {
    cout << "adjacent_find: " << *it 
      << ", " << *(it + 1) << endl;
    it = adjacent_find(it + 2, v.end());
  }
  it = adjacent_find(v.begin(), v.end(), 
    PlusOne());
  while(it != v.end()) {
    cout << "adjacent_find PlusOne: " << *it
      << ", " << *(it + 1) << endl;
    it = adjacent_find(it + 1, v.end(), 
      PlusOne());
  }
  int b[] = { 8, 11 };
  const int bsz = sizeof b / sizeof *b;
  print(b, b + bsz, "b", " ");
  it = find_first_of(v.begin(), v.end(),
    b, b + bsz);
  print(it, it + bsz, "find_first_of", " ");
  it = find_first_of(v.begin(), v.end(), 
    b, b + bsz, PlusOne());
  print(it,it + bsz,"find_first_of PlusOne"," ");
  it = search(v.begin(), v.end(), b, b + bsz);
  print(it, it + bsz, "search", " ");
  int c[] = { 5, 6, 7 };
  const int csz = sizeof c / sizeof *c;
  print(c, c + csz, "c", " ");
  it = search(v.begin(), v.end(), 
    c, c + csz, PlusOne());
  print(it, it + csz,"search PlusOne", " ");
  int d[] = { 11, 11, 11 };
  const int dsz = sizeof d / sizeof *d;
  print(d, d + dsz, "d", " ");
  it = find_end(v.begin(), v.end(), d, d + dsz);
  print(it, v.end(),"find_end", " ");
  int e[] = { 9, 9 };
  print(e, e + 2, "e", " ");
  it = find_end(v.begin(), v.end(), 
    e, e + 2, PlusOne());
  print(it, v.end(),"find_end PlusOne"," ");
  it = search_n(v.begin(), v.end(), 3, 7);
  print(it, it + 3, "search_n 3, 7", " ");
  it = search_n(v.begin(), v.end(), 
    6, 15, MulMoreThan(100));
  print(it, it + 6, 
    "search_n 6, 15, MulMoreThan(100)", " ");
  cout << "min_element: " <<
    *min_element(v.begin(), v.end()) << endl;
  cout << "max_element: " <<
    *max_element(v.begin(), v.end()) << endl;
  vector<int> v2;
  replace_copy(v.begin(), v.end(), 
    back_inserter(v2), 8, 47);
  print(v2, "replace_copy 8 -> 47", " ");
  replace_if(v.begin(), v.end(), 
    bind2nd(greater_equal<int>(), 7), -1);
  print(v, "replace_if >= 7 -> -1", " ");

} ///:~

The example begins with two predicates: PlusOne which is a binary predicate that returns true if the second argument is equivalent to one plus the first argument, and MulMoreThan which returns true if the first argument times the second argument is greater than a value stored in the object. These binary predicates are used as tests in the example.

In main( ), an array a is created and fed to the constructor for vector<int> v . This vector will be used as the target for the search and replace activities, and you’ll note that there are duplicate elements – these will be discovered by some of the search/replace routines.

The first test demonstrates find( ), discovering the value 4 in v. The return value is the iterator pointing to the first instance of 4, or the end of the input range ( v.end( )) if the search value is not found.

find_if( ) uses a predicate to determine if it has discovered the correct element. In the above example, this predicate is created on the fly using greater<int> (that is, “see if the first int argument is greater than the second”) and bind2nd( ) to fix the second argument to 8. Thus, it returns true if the value in v is greater than 8.

Since there are a number of cases in v where two identical objects appear next to each other, the test of adjacent_find( ) is designed to find them all. It starts looking from the beginning and then drops into a while loop, making sure that the iterator it has not reached the end of the input sequence (which would mean that no more matches can be found). For each match it finds, the loop prints out the matches and then performs the next adjacent_find( ), this time using it + 2 as the first argument (this way, it moves past the two elements that it already found).

You might look at the while loop and think that you can do it a bit more cleverly, to wit:

  while(it != v.end()) {
    cout << "adjacent_find: " << *it++
      << ", " << *it++ << endl;
    it = adjacent_find(it, v.end());

}

Of course, this is exactly what I tried at first. However, I did not get the output I expected, on any compiler. This is because there is no guarantee about when the increments occur in the above expression. A bit of a disturbing discovery, I know, but the situation is best avoided now that you’re aware of it.

The next test uses adjacent_find( ) with the PlusOne predicate, which discovers all the places where the next number in the sequence v changes from the previous by one. The same while approach is used to find all the cases.

find_first_of( ) requires a second range of objects for which to hunt; this is provided in the array b. Notice that, because the first range and the second range in find_first_of( ) are controlled by separate template arguments, those ranges can refer to two different types of containers, as seen here. The second form of find_first_of( ) is also tested, using PlusOne.

search( ) finds exactly the second range inside the first one, with the elements in the same order. The second form of search( ) uses a predicate, which is typically just something that defines equivalence, but it also opens some interesting possibilities – here, the PlusOne predicate causes the range { 4, 5, 6 } to be found.

The find_end( ) test discovers the last occurrence of the entire sequence { 11, 11, 11 } . To show that it has in fact found the last occurrence, the rest of v starting from it is printed.

The first search_n( ) test looks for 3 copies of the value 7, which it finds and prints. When using the second version of search_n( ), the predicate is ordinarily meant to be used to determine equivalence between two elements, but I’ve taken some liberties and used a function object that multiplies the value in the sequence by (in this case) 15 and checks to see if it’s greater than 100. That is, the search_n( ) test above says “find me 6 consecutive values which, when multiplied by 15, each produce a number greater than 100.” Not exactly what you normally expect to do, but it might give you some ideas the next time you have an odd searching problem.

min_element( ) and max_element( ) are straightforward; the only thing that’s a bit odd is that it looks like the function is being dereferenced with a ‘ *’. Actually, the returned iterator is being dereferenced to produce the value for printing.

To test replacements, replace_copy( ) is used first (so it doesn’t modify the original vector) to replace all values of 8 with the value 47. Notice the use of back_inserter( ) with the empty vector v2. To demonstrate replace_if( ), a function object is created using the standard template greater_equal along with bind2nd to replace all the values that are greater than or equal to 7 with the value -1.

Comparing ranges

These algorithms provide ways to compare two ranges. At first glance, the operations they perform seem very close to the search( ) function above. However, search( ) tells you where the second sequence appears within the first, while equal( ) and lexicographical_compare( ) simply tell you whether or not two sequences are exactly identical (using different comparison algorithms). On the other hand, mismatch( ) does tell you where the two sequences go out of sync, but those sequences must be exactly the same length.

bool equal(InputIterator first1, InputIterator last1, InputIterator first2);

bool equal(InputIterator first1, InputIterator last1, InputIterator first2

BinaryPredicate binary_pred);

In both of these functions, the first range is the typical one, [first1, last1) . The second range starts at first2, but there is no “last2” because its length is determined by the length of the first range. The equal( ) function returns true if both ranges are exactly the same (the same elements in the same order); in the first case, the operator== is used to perform the comparison and in the second case binary_pred is used to decide if two elements are the same.

bool lexicographical_compare(InputIterator1 first1, InputIterator1 last1

InputIterator2 first2, InputIterator2 last2);

bool lexicographical_compare(InputIterator1 first1, InputIterator1 last1

InputIterator2 first2, InputIterator2 last2, BinaryPredicate binary_pred);

These two functions determine if the first range is “lexicographically less” than the second (they return true if range 1 is less than range 2, and false otherwise. Lexicographical equality, or “dictionary” comparison, means that the comparison is done the same way we establish the order of strings in a dictionary, one element at a time. The first elements determine the result if these elements are different, but if they’re equal the algorithm moves on to the next elements and looks at those, and so on. until it finds a mismatch. At that point it looks at the elements, and if the element from range 1 is less than the element from range two, then lexicographical_compare( ) returns true, otherwise it returns false. If it gets all the way through one range or the other (the ranges may be different lengths for this algorithm) without finding an inequality, then range 1 is not less than range 2 so the function returns false.

If the two ranges are different lengths, a missing element in one range acts as one that “precedes” an element that exists in the other range. So {‘a’, ‘b’} lexicographically precedes {‘a’, ‘b’, ‘a’ }.

In the first version of the function, operator< is used to perform the comparisons, and in the second version binary_pred is used.

pair<InputIterator1, InputIterator2> mismatch(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2);

pair<InputIterator1, InputIterator2> mismatch(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2, BinaryPredicate binary_pred);

As in equal( ), the length of both ranges is exactly the same, so only the first iterator in the second range is necessary, and the length of the first range is used as the length of the second range. Whereas equal( ) just tells you whether or not the two ranges are the same, mismatch( ) tells you where they begin to differ. To accomplish this, you must be told (1) the element in the first range where the mismatch occurred and (2) the element in the second range where the mismatch occurred. These two iterators are packaged together into a pair object and returned. If no mismatch occurs, the return value is last1 combined with the past-the-end iterator of the second range.

As in equal( ), the first function tests for equality using operator== while the second one uses binary_pred.

Example

Because the standard C++ string class is built like a container (it has begin( ) and end( ) member functions which produce objects of type string::iterator), it can be used to conveniently create ranges of characters to test with the STL comparison algorithms. However, you should note that string has a fairly complete set of native operations, so you should look at the string class before using the STL algorithms to perform operations.

//: C21:Comparison.cpp
// The STL range comparison algorithms
#include "PrintSequence.h"
#include <vector>
#include <algorithm>
#include <functional>
#include <string>
using namespace std;

int main() {
  // strings provide a convenient way to create
  // ranges of characters, but you should 
  // normally look for native string operations:
  string s1("This is a test");
  string s2("This is a Test");
  cout << "s1: " << s1 << endl 
    << "s2: " << s2 << endl;
  cout << "compare s1 & s1: " 
    << equal(s1.begin(), s1.end(), s1.begin())
    << endl;
  cout << "compare s1 & s2: " 
    << equal(s1.begin(), s1.end(), s2.begin())
    << endl;
  cout << "lexicographical_compare s1 & s1: " <<
    lexicographical_compare(s1.begin(), s1.end(),
      s1.begin(), s1.end()) <<  endl;
  cout << "lexicographical_compare s1 & s2: " <<
    lexicographical_compare(s1.begin(), s1.end(),
      s2.begin(), s2.end()) << endl;
  cout << "lexicographical_compare s2 & s1: " <<
    lexicographical_compare(s2.begin(), s2.end(),
      s1.begin(), s1.end()) << endl;
  cout << "lexicographical_compare shortened " 
    "s1 & full-length s2: " << endl;
  string s3(s1);
  while(s3.length() != 0) {
    bool result = lexicographical_compare(
      s3.begin(), s3.end(), s2.begin(),s2.end());
    cout << s3 << endl << s2 << ", result = " 
      << result << endl;
    if(result == true) break;
    s3 = s3.substr(0, s3.length() - 1);
  }
  pair<string::iterator, string::iterator> p =
    mismatch(s1.begin(), s1.end(), s2.begin());
  print(p.first, s1.end(), "p.first", "");
  print(p.second, s2.end(), "p.second","");

} ///:~

Note that the only difference between s1 and s2 is the capital ‘T’ in s2’s “Test.” Comparing s1 and s1 for equality yields true, as expected, while s1 and s2 are not equal because of the capital ‘T’.

To understand the output of the lexicographical_compare( ) tests, you must remember two things: first, the comparison is performed character-by-character, and second that capital letters “precede” lowercase letters. In the first test, s1 is compared to s1. These are exactly equivalent, thus one is not lexicographically less than the other (which is what the comparison is looking for) and thus the result is false. The second test is asking “does s1 precede s2?” When the comparison gets to the ‘t’ in “test”, it discovers that the lowercase ‘t’ in s1 is “greater” than the uppercase ‘T’ in s2, so the answer is again false. However, if we test to see whether s2 precedes s1, the answer is true.

To further examine lexicographical comparison, the next test in the above example compares s1 with s2 again (which returned false before). But this time it repeats the comparison, trimming one character off the end of s1 (which is first copied into s3) each time through the loop until the test evaluates to true. What you’ll see is that, as soon as the uppercase ‘T’ is trimmed off of s3 (the copy of s1), then the characters, which are exactly equal up to that point, no longer count and the fact that s3 is shorter than s2 is what makes it lexicographically precede s2.

The final test uses mismatch( ) . In order to capture the return value, you must first create the appropriate pair p , constructing the template using the iterator type from the first range and the iterator type from the second range (in this case, both string::iterators). To print the results, the iterator for the mismatch in the first range is p.first, and for the second range is p.second. In both cases, the range is printed from the mismatch iterator to the end of the range so you can see exactly where the iterator points.

Removing elements

Because of the genericity of the STL, the concept of removal is a bit constrained. Since elements can only be “removed” via iterators, and iterators can point to arrays, vectors, lists, etc., it is not safe or reasonable to actually try to destroy the elements that are being removed, and to change the size of the input range [first, last) (an array, for example, cannot have its size changed). So instead, what the STL “remove” functions do is rearrange the sequence so that the “removed” elements are at the end of the sequence, and the “un-removed” elements are at the beginning of the sequence (in the same order that they were before, minus the removed elements – that is, this is a stable operation). Then the function will return an iterator to the “new last” element of the sequence, which is the end of the sequence without the removed elements and the beginning of the sequence of the removed elements. In other words, if new_last is the iterator that is returned from the “remove” function, then [first, new_last) is the sequence without any of the removed elements, and [new_last, last) is the sequence of removed elements.

If you are simply using your sequence, including the removed elements, with more STL algorithms, you can just use new_last as the new past-the-end iterator. However, if you’re using a resizable container c (not an array) and you actually want to eliminate the removed elements from the container you can use erase( ) to do so, for example:

c.erase(remove(c.begin(), c.end(), value), c.end());

The return value of remove( ) is the new_last iterator, so erase( ) will delete all the removed elements from c.

The iterators in [new_last, last) are dereferenceable but the element values are undefined and should not be used.

ForwardIterator remove(ForwardIterator first, ForwardIterator last, const T& value);

ForwardIterator remove_if(ForwardIterator first, ForwardIterator last,

Predicate pred);

OutputIterator remove_copy(InputIterator first, InputIterator last,

OutputIterator result, const T& value);

OutputIterator remove_copy_if(InputIterator first, InputIterator last,

OutputIterator result, Predicate pred);

Each of the “remove” forms moves through the range [first, last) , finding values that match a removal criterion and copying the un-removed elements over the removed elements (thus effectively removing them). The original order of the un-removed elements is maintained. The return value is an iterator pointing past the end of the range that contains none of the removed elements. The values that this iterator points to are unspecified.

The “if” versions pass each element to pred( ) to determine whether it should be removed or not (if pred( ) returns true, the element is removed). The “copy” versions do not modify the original sequence, but instead copy the un-removed values into a range beginning at result, and return an iterator indicating the past-the-end value of this new range.

ForwardIterator unique(ForwardIterator first, ForwardIterator last);

ForwardIterator unique(ForwardIterator first, ForwardIterator last,

BinaryPredicate binary_pred);

OutputIterator unique_copy(InputIterator first, InputIterator last,

OutputIterator result);

OutputIterator unique_copy(InputIterator first, InputIterator last,

OutputIterator result, BinaryPredicate binary_pred);

Each of the “unique” functions moves through the range [first, last) , finding adjacent values that are equivalent (that is, duplicates) and “removing” the duplicate elements by copying over them. The original order of the un-removed elements is maintained. The return value is an iterator pointing past the end of the range that has the adjacent duplicates removed.

Because only duplicates that are adjacent are removed, it’s likely that you’ll want to call sort( ) before calling a “unique” algorithm, since that will guarantee that all the duplicates are removed.

The versions containing binary_pred call, for each iterator value i in the input range:

binary_pred(*i, *(i-1));

and if the result is true then *(i-1) is considered a duplicate.

The “copy” versions do not modify the original sequence, but instead copy the un-removed values into a range beginning at result, and return an iterator indicating the past-the-end value of this new range.

Example

This example gives a visual demonstration of the way the “remove” and “unique” functions work.

//: C21:Removing.cpp
// The removing algorithms
#include "PrintSequence.h"
#include "Generators.h"
#include <vector>
#include <algorithm>
#include <cctype>
using namespace std;

struct IsUpper {
  bool operator()(char c) {
    return isupper(c);
  }
};

int main() {
  vector<char> v(50);
  generate(v.begin(), v.end(), CharGen());
  print(v, "v", "");
  // Create a set of the characters in v:
  set<char> cs(v.begin(), v.end());
  set<char>::iterator it = cs.begin();
  vector<char>::iterator cit;
  // Step through and remove everything:
  while(it != cs.end()) {
    cit = remove(v.begin(), v.end(), *it);
    cout << *it << "[" << *cit << "] ";
    print(v, "", "");
    it++;
  }
  generate(v.begin(), v.end(), CharGen());
  print(v, "v", "");
  cit = remove_if(v.begin(), v.end(), IsUpper());
  print(v.begin(), cit, "after remove_if", "");
  // Copying versions are not shown for remove
  // and remove_if.
  sort(v.begin(), cit);
  print(v.begin(), cit, "sorted", "");
  vector<char> v2;
  unique_copy(v.begin(), cit, back_inserter(v2));
  print(v2, "unique_copy", "");
  // Same behavior:
  cit = unique(v.begin(), cit, equal_to<char>());
  print(v.begin(), cit, "unique", "");

} ///:~

The vector<char> v is filled with randomly-generated characters and then copied into a set. Each element of the set is used in a remove statement, but the entire vector v is printed out each time so you can see what happens to the rest of the range, after the resulting endpoint (which is stored in cit).

To demonstrate remove_if( ), the address of the Standard C library function isupper( ) (in <cctype> is called inside of the function object class IsUpper, an object of which is passed as the predicate for remove_if( ). This only returns true if a character is uppercase, so only lowercase characters will remain. Here, the end of the range is used in the call to print( ) so only the remaining elements will appear. The copying versions of remove( ) and remove_if( ) are not shown because they are a simple variation on the non-copying versions which you should be able to use without an example.

The range of lowercase letters is sorted in preparation for testing the “unique” functions (the “unique” functions are not undefined if the range isn’t sorted, but it’s probably not what you want). First, unique_copy( ) puts the unique elements into a new vector using the default element comparison, and then the form of unique( ) that takes a predicate is used; the predicate used is the built-in function object equal_to( ), which produces the same results as the default element comparison.

Sorting and operations on sorted ranges

There is a significant category of STL algorithms which require that the range they operate on be in sorted order.

There is actually only one “sort” algorithm used in the STL. This algorithm is presumably the fastest one, but the implementor has fairly broad latitude. However, it comes packaged in various flavors depending on whether the sort should be stable, partial or just the regular sort. Oddly enough, only the partial sort has a copying version; otherwise you’ll need to make your own copy before sorting if that’s what you want. If you are working with a very large number of items you may be better off transferring them to an array (or at least a vector, which uses an array internally) rather than using them in some of the STL containers.

Once your sequence is sorted, there are many operations you can perform on that sequence, from simply locating an element or group of elements to merging with another sorted sequence or manipulating sequences as mathematical sets.

Each algorithm involved with sorting or operations on sorted sequences has two versions of each function, the first that uses the object’s own operator< to perform the comparison, and the second that uses an additional StrictWeakOrdering object’s operator( )(a, b) to compare two objects for a < b. Other than this there are no differences, so the distinction will not be pointed out in the description of each algorithm.

Sorting

One STL container ( list) has its own built-in sort( ) function which is almost certainly going to be faster than the generic sort presented here (especially since the list sort just swaps pointers rather than copying entire objects around). This means that you’ll only want to use the sort functions here if (a) you’re working with an array or a sequence container that doesn’t have a sort( ) function or (b) you want to use one of the other sorting flavors, like a partial or stable sort, which aren’t supported by list’s sort( ).

void sort(RandomAccessIterator first, RandomAccessIterator last);

void sort(RandomAccessIterator first, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Sorts [first, last) into ascending order. The second form allows a comparator object to determine the order.

void stable_sort(RandomAccessIterator first, RandomAccessIterator last);

void stable_sort(RandomAccessIterator first, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Sorts [first, last) into ascending order, preserving the original ordering of equivalent elements (this is important if elements can be equivalent but not identical). The second form allows a comparator object to determine the order.

void partial_sort(RandomAccessIterator first,

RandomAccessIterator middle, RandomAccessIterator last);

void partial_sort(RandomAccessIterator first,

RandomAccessIterator middle, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Sorts the number of elements from [first, last) that can be placed in the range [first, middle) . The rest of the elements end up in [middle, last) , and have no guaranteed order. The second form allows a comparator object to determine the order.

RandomAccessIterator partial_sort_copy(InputIterator first, InputIterator last,

RandomAccessIterator result_first, RandomAccessIterator result_last);

RandomAccessIterator partial_sort_copy(InputIterator first,

InputIterator last, RandomAccessIterator result_first,

RandomAccessIterator result_last, StrictWeakOrdering binary_pred);

Sorts the number of elements from [first, last) that can be placed in the range [result_first, result_last) , and copies those elements into [result_first, result_last) . If the range [first, last) is smaller than [result_first, result_last) , then the smaller number of elements is used. The second form allows a comparator object to determine the order.

void nth_element(RandomAccessIterator first,

RandomAccessIterator nth, RandomAccessIterator last);

void nth_element(RandomAccessIterator first,

RandomAccessIterator nth, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Just like partial_sort( ), nth_element( ) partially orders a range of elements. However, it’s much “less ordered” than partial_sort( ). The only thing that nth_element( ) guarantees is that whatever location you choose will become a dividing point. All the elements in the range [first, nth) will be less than (they could also be equivalent to) whatever element ends up at location nth and all the elements in the range (nth, last] will be greater than whatever element ends up location nth. However, neither range is in any particular order, unlike partial_sort( ) which has the first range in sorted order.

If all you need is this very weak ordering (if, for example, you’re determining medians, percentiles and that sort of thing) this algorithm is faster than partial_sort( ).

Example

The StreamTokenizer class from the previous chapter is used to break a file into words, and each word is turned into an NString and added to a deque<NString>. Once the input file is completely read, a vector<NString> is created from the contents of the deque. The vector is then used to demonstrate the sorting algorithms:

//: C21:SortTest.cpp
//{L} ../C20/StreamTokenizer
// Test different kinds of sorting
#include "../C20/StreamTokenizer.h"
#include "NString.h"
#include "PrintSequence.h"
#include "Generators.h"
#include "../require.h"
#include <algorithm>
#include <fstream>
#include <queue>
#include <vector>
#include <cctype>
using namespace std;

// For sorting NStrings and ignore string case:
struct NoCase {
  bool operator()(
    const NString& x, const NString& y) {
/* Somthing's wrong with this approach but I
   can't seem to see it. It would be much faster:
    const string& lv = x;
    const string& rv = y;
    int len = min(lv.size(), rv.size());
    for(int i = 0; i < len; i++)
      if(tolower(lv[i]) < tolower(rv[i]))
        return true;
    return false;
  }
*/
    // Brute force: copy, force to lowercase:
    string lv(x);
    string rv(y);
    lcase(lv);
    lcase(rv);
    return lv < rv;
  }
  void lcase(string& s) {
    int n = s.size();
    for(int i = 0; i < n; i++)
      s[i] = tolower(s[i]);
  }
};

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  StreamTokenizer words(in);
  deque<NString> nstr;
  string word;
  while((word = words.next()).size() != 0)
    nstr.push_back(NString(word));
  print(nstr);
  // Create a vector from the contents of nstr:
  vector<NString> v(nstr.begin(), nstr.end());
  sort(v.begin(), v.end());
  print(v, "sort");
  // Use an additional comparator object:
  sort(v.begin(), v.end(), NoCase());
  print(v, "sort NoCase");
  copy(nstr.begin(), nstr.end(), v.begin());
  stable_sort(v.begin(), v.end());
  print(v, "stable_sort");
  // Use an additional comparator object:
  stable_sort(v.begin(), v.end(), 
    greater<NString>());
  print(v, "stable_sort greater");
  copy(nstr.begin(), nstr.end(), v.begin());
  // Partial sorts. The additional comparator 
  // versions are obvious and not shown here.
  partial_sort(v.begin(), 
    v.begin() + v.size()/2, v.end());
  print(v, "partial_sort");
  // Create a vector with a preallocated size:
  vector<NString> v2(v.size()/2);
  partial_sort_copy(v.begin(), v.end(), 
    v2.begin(), v2.end());
  print(v2, "partial_sort_copy");
  // Finally, the weakest form of ordering:
  vector<int> v3(20);
  generate(v3.begin(), v3.end(), URandGen(50));
  print(v3, "v3 before nth_element");
  int n = 10;
  vector<int>::iterator vit = v3.begin() + n;
  nth_element(v3.begin(), vit, v3.end());
  cout << "After ordering with nth = " << n
    << ", nth element is " << v3[n] << endl;
  print(v3, "v3 after nth_element");

} ///:~

The first class is a binary predicate used to compare two NString objects while ignoring the case of the strings. You can pass the object into the various sort routines to produce an alphabetic sort (rather than the default lexicographic sort, which has all the capital letters in one group, followed by all the lowercase letters).

As an example, try the source code for the above file as input. Because the occurrence numbers are printed along with the strings you can distinguish between an ordinary sort and a stable sort, and you can also see what happens during a partial sort (the remaining unsorted elements are in no particular order). There is no “partial stable sort.”

You’ll notice that the use of the second “comparator” forms of the functions are not exhaustively tested in the above example, but the use of a comparator is the same as in the first part of the example.

The test of nth_element does not use the NString objects because it’s simpler to see what’s going on if ints are used. Notice that, whatever the nth element turns out to be (which will vary from one run to another because of URandGen), the elements before that are less, and after that are greater, but the elements have no particular order other than that. Because of URandGen, there are no duplicates but if you use a generator that allows duplicates you can see that the elements before the nth element will be less than or equal to the nth element.

Locating elements in sorted ranges

Once a range is sorted, there are a group of operations that can be used to find elements within those ranges. In the following functions, there are always two forms, one that assumes the intrinsic operator< has been used to perform the sort, and the second that must be used if some other comparison function object has been used to perform the sort. You must use the same comparison for locating elements as you do to perform the sort, otherwise the results are undefined. In addition, if you try to use these functions on unsorted ranges the results will be undefined.

bool binary_search(ForwardIterator first, ForwardIterator last, const T& value);

bool binary_search(ForwardIterator first, ForwardIterator last, const T& value,

StrictWeakOrdering binary_pred);

Tells you whether value appears in the sorted range [first, last) .

ForwardIterator lower_bound(ForwardIterator first, ForwardIterator last,

const T& value);

ForwardIterator lower_bound(ForwardIterator first, ForwardIterator last,

const T& value, StrictWeakOrdering binary_pred);

Returns an iterator indicating the first occurrence of value in the sorted range [first, last) . Returns last if value is not found.

ForwardIterator upper_bound(ForwardIterator first, ForwardIterator last,

const T& value);

ForwardIterator upper_bound(ForwardIterator first, ForwardIterator last,

const T& value, StrictWeakOrdering binary_pred);

Returns an iterator indicating one past the last occurrence of value in the sorted range [first, last) . Returns last if value is not found.

pair<ForwardIterator, ForwardIterator>

equal_range(ForwardIterator first, ForwardIterator last,

const T& value);

pair<ForwardIterator, ForwardIterator>

equal_range(ForwardIterator first, ForwardIterator last,

const T& value, StrictWeakOrdering binary_pred);

Essentially combines lower_bound( ) and upper_bound( ) to return a pair indicating the first and one-past-the-last occurrences of value in the sorted range [first, last) . Both iterators indicate last if value is not found.

Example

Here, we can use the approach from the previous example:

//: C21:SortedSearchTest.cpp
//{L} ../C20/StreamTokenizer
// Test searching in sorted ranges
#include "../C20/StreamTokenizer.h"
#include "PrintSequence.h"
#include "NString.h"
#include "../require.h"
#include <algorithm>
#include <fstream>
#include <queue>
#include <vector>
using namespace std;

int main() {
  ifstream in("SortedSearchTest.cpp");
  assure(in, "SortedSearchTest.cpp");
  StreamTokenizer words(in);
  deque<NString> dstr;
  string word;
  while((word = words.next()).size() != 0)
    dstr.push_back(NString(word));
  vector<NString> v(dstr.begin(), dstr.end());
  sort(v.begin(), v.end());
  print(v, "sorted");
  typedef vector<NString>::iterator sit;
  sit it, it2;
  string f("include");
  cout << "binary search: " 
    << binary_search(v.begin(), v.end(), f) 
    << endl;
  it = lower_bound(v.begin(), v.end(), f);
  it2 = upper_bound(v.begin(), v.end(), f);
  print(it, it2, "found range");
  pair<sit, sit> ip = 
    equal_range(v.begin(), v.end(), f);
  print(ip.first, ip.second, 
    "equal_range");

} ///:~

The input is forced to be the source code for this file because the word “include” will be used for a find string (since “include” appears many times). The file is tokenized into words that are placed into a deque (a better container when you don’t know how much storage to allocate), and left unsorted in the deque. The deque is copied into a vector via the appropriate constructor, and the vector is sorted and printed.

The binary_search( ) function only tells you if the object is there or not; lower_bound( ) and upper_bound( ) produce iterators to the beginning and ending positions where the matching objects appear. The same effect can be produced more succinctly using equal_range( ) (as shown in the previous chapter, with multimap and multiset).

Merging sorted ranges

As before, the first form of each function assumes the intrinsic operator< has been used to perform the sort. The second form must be used if some other comparison function object has been used to perform the sort. You must use the same comparison for locating elements as you do to perform the sort, otherwise the results are undefined. In addition, if you try to use these functions on unsorted ranges the results will be undefined.

OutputIterator merge(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2, OutputIterator result);

OutputIterator merge(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2, OutputIterator result,

StrictWeakOrdering binary_pred);

Copies elements from [first1, last1) and [first2, last2) into result, such that the resulting range is sorted in ascending order. This is a stable operation.

void inplace_merge(BidirectionalIterator first,

BidirectionalIterator middle, BidirectionalIterator last);

void inplace_merge(BidirectionalIterator first,

BidirectionalIterator middle, BidirectionalIterator last,

StrictWeakOrdering binary_pred);

This assumes that [first, middle) and [middle, last) are each sorted ranges. The two ranges are merged so that the resulting range [first, last) contains the combined ranges in sorted order.

Example

It’s easier to see what goes on with merging if ints are used; the following example also emphasizes how the algorithms (and my own print template) work with arrays as well as containers.

//: C21:MergeTest.cpp
// Test merging in sorted ranges
#include <algorithm>
#include "PrintSequence.h"
#include "Generators.h"
using namespace std;

int main() {
  const int sz = 15;
  int a[sz*2] = {0};
  // Both ranges go in the same array:
  generate(a, a + sz, SkipGen(0, 2));
  generate(a + sz, a + sz*2, SkipGen(1, 3));
  print(a, a + sz, "range1", " ");  
  print(a + sz, a + sz*2, "range2", " ");
  int b[sz*2] = {0}; // Initialize all to zero
  merge(a, a + sz, a + sz, a + sz*2, b);
  print(b, b + sz*2, "merge", " ");
  // set_union is a merge that removes duplicates
  set_union(a, a + sz, a + sz, a + sz*2, b);
  print(b, b + sz*2, "set_union", " ");
  inplace_merge(a, a + sz, a + sz*2);
  print(a, a + sz*2, "inplace_merge", " ");

} ///:~

In main( ), instead of creating two separate arrays both ranges will be created end-to-end in the same array a (this will come in handy for the inplace_merge). The first call to merge( ) places the result in a different array, b. For comparison, set_union( ) is also called, which has the same signature and similar behavior, except that it removes the duplicates. Finally, inplace_merge( ) is used to combine both parts of a.

Set operations on sorted ranges

Once ranges have been sorted, you can perform mathematical set operations on them.

bool includes(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2);

bool includes (InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2,

StrictWeakOrdering binary_pred);

Returns true if [first2, last2) is a subset of [first1, last1) . Neither range is required to hold only unique elements, but if [first2, last2) holds n elements of a particular value, then [first1, last1) must also hold n elements if the result is to be true.

OutputIterator set_union(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2, OutputIterator result);

OutputIterator set_union(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2, OutputIterator result,

StrictWeakOrdering binary_pred);

Creates the mathematical union of two sorted ranges in the result range, returning the end of the output range. Neither input range is required to hold only unique elements, but if a particular value appears multiple times in both input sets, then the resulting set will contain the larger number of identical values.

OutputIterator set_intersection (InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2, OutputIterator result);

OutputIterator set_intersection (InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2, OutputIterator result,

StrictWeakOrdering binary_pred);

Produces, in result, the intersection of the two input sets, returning the end of the output range. That is, the set of values that appear in both input sets. Neither input range is required to hold only unique elements, but if a particular value appears multiple times in both input sets, then the resulting set will contain the smaller number of identical values.

OutputIterator set_difference (InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, InputIterator2 last2, OutputIterator result);

OutputIterator set_difference (InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2, OutputIterator result,

StrictWeakOrdering binary_pred);

Produces, in result, the mathematical set difference, returning the end of the output range. All the elements that are in [first1, last1) but not in [first2, last2) are placed in the result set. Neither input range is required to hold only unique elements, but if a particular value appears multiple times in both input sets ( n times in set 1 and m times in set 2), then the resulting set will contain max(n-m, 0) copies of that value.

OutputIterator set_symmetric_difference(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2, InputIterator2 last2,

OutputIterator result);

OutputIterator set_symmetric_difference(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2, InputIterator2 last2,

OutputIterator result, StrictWeakOrdering binary_pred);

Constructs, in result, the set containing:

All the elements in set 1 that are not in set 2
All the elements in set 2 that are not in set 1.

Neither input range is required to hold only unique elements, but if a particular value appears multiple times in both input sets ( n times in set 1 and m times in set 2), then the resulting set will contain abs(n-m) copies of that value, where abs( ) is the absolute value. The return value is the end of the output range

Example

It’s easiest to see the set operations demonstrated using simple vectors of characters, so you view the sets more easily. These characters are randomly generated and then sorted, but the duplicates are not removed so you can see what the set operations do when duplicates are involved.

//: C21:SetOperations.cpp
// Set operations on sorted ranges
#include <vector>
#include <algorithm>
#include "PrintSequence.h"
#include "Generators.h"
using namespace std;

int main() {
  vector<char> v(50), v2(50);
  CharGen g;
  generate(v.begin(), v.end(), g);
  generate(v2.begin(), v2.end(), g);
  sort(v.begin(), v.end());
  sort(v2.begin(), v2.end());
  print(v, "v", "");
  print(v2, "v2", "");
  bool b = includes(v.begin(), v.end(), 
    v.begin() + v.size()/2, v.end());
  cout << "includes: " <<
    (b ? "true" : "false") << endl;
  vector<char> v3, v4, v5, v6;
  set_union(v.begin(), v.end(), 
    v2.begin(), v2.end(), back_inserter(v3));
  print(v3, "set_union", "");
  set_intersection(v.begin(), v.end(), 
    v2.begin(), v2.end(), back_inserter(v4));
  print(v4, "set_intersection", "");
  set_difference(v.begin(), v.end(), 
    v2.begin(), v2.end(), back_inserter(v5));
  print(v5, "set_difference", "");
  set_symmetric_difference(v.begin(), v.end(), 
    v2.begin(), v2.end(), back_inserter(v6));
  print(v6, "set_symmetric_difference","");

} ///:~

After v and v2 are generated, sorted and printed, the includes( ) algorithm is tested by seeing if the entire range of v contains the last half of v, which of course it does so the result should always be true. The vectors v3, v4, v5 and v6 are created to hold the output of set_union( ), set_intersection( ), set_difference( ) and set_symmetric_difference( ), and the results of each are displayed so you can ponder them and convince yourself that the algorithms do indeed work as promised.

Heap operations

The heap operations in the STL are primarily concerned with the creation of the STL priority_queue, which provides efficient access to the “largest” element, whatever “largest” happens to mean for your program. These were discussed in some detail in the previous chapter, and you can find an example there.

As with the “sort” operations, there are two versions of each function, the first that uses the object’s own operator< to perform the comparison, the second that uses an additional StrictWeakOrdering object’s operator( )(a, b) to compare two objects for a < b.

void make_heap(RandomAccessIterator first, RandomAccessIterator last);

void make_heap(RandomAccessIterator first, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Turns an arbitrary range into a heap. A heap is just a range that is organized in a particular way.

void push_heap(RandomAccessIterator first, RandomAccessIterator last);

void push_heap(RandomAccessIterator first, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Adds the element *( last-1) to the heap determined by the range [first, last-1) . Yes, it seems like an odd way to do things but remember that the priority_queue container presents the nice interface to a heap, as shown in the previous chapter.

void pop_heap(RandomAccessIterator first, RandomAccessIterator last);

void pop_heap(RandomAccessIterator first, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Places the largest element (which is actually in *first, before the operation, because of the way heaps are defined) into the position *(last-1) and reorganizes the remaining range so that it’s still in heap order. If you simply grabbed *first, the next element would not be the next-largest element so you must use pop_heap( ) if you want to maintain the heap in its proper priority-queue order.

void sort_heap(RandomAccessIterator first, RandomAccessIterator last);

void sort_heap(RandomAccessIterator first, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

This could be thought of as the complement of make_heap( ), since it takes a range that is in heap order and turns it into ordinary sorted order, so it is no longer a heap. That means that if you call sort_heap( ) you can no longer use push_heap( ) or pop_heap( ) on that range (rather, you can use those functions but they won’t do anything sensible). This is not a stable sort.

Applying an operation to each element in a range

These algorithms move through the entire range and perform an operation on each element. They differ in what they do with the results of that operation: for_each( ) discards the return value of the operation (but returns the function object that has been applied to each element), while transform( ) places the results of each operation into a destination sequence (which can be the original sequence).

UnaryFunction for_each(InputIterator first, InputIterator last, UnaryFunction f);

Applies the function object f to each element in [first, last) , discarding the return value from each individual application of f. If f is just a function pointer then you are typically not interested in the return value, but if f is an object that maintains some internal state it can capture the combined return value of being applied to the range. The final return value of for_each( ) is f.

OutputIterator transform(InputIterator first, InputIterator last,

OutputIterator result, UnaryFunction f);

OutputIterator transform(InputIterator1 first, InputIterator1 last,

InputIterator2 first2, OutputIterator result, BinaryFunction f);

Like for_each( ), applies a function object f to each element in the range [first, last) . However, instead of discarding the result of each function call, transform( ) copies the result (using operator=) into *result, incrementing result after each copy (the sequence pointed to by result must have enough storage, otherwise you should use an inserter to force insertions instead of assignments).

The first form of transform( ) simply calls f( ) and passes it each object from the input range as an argument. The second form passes an object from the first input range and one from the second input range as the two arguments to the binary function f (note the length of the second input range is determined by the length of the first). The return value in both cases is the past-the-end iterator for the resulting output range.

Examples

Since much of what you do with objects in a container is to apply an operation to all of those objects, these are fairly important algorithms and merit several illustrations.

First, consider for_each( ). This sweeps through the range, pulling out each element and passing it as an argument as it calls whatever function object it’s been given. Thus for_each( ) performs operations that you might normally write out by hand. In Stlshape.cpp, for example:

for(Iter j = shapes.begin();
      j != shapes.end(); j++)

delete *j;

If you look in your compiler’s header file at the template defining for_each( ), you’ll see something like this:

template <class InputIterator, class Function>
Function for_each(InputIterator first, 
                  InputIterator last, 
                  Function f) {
    while (first != last) f(*first++);
    return f;

}

Function f looks at first like it must be a pointer to a function which takes, as an argument, an object of whatever InputIterator selects. However, the above template actually only says that you must be able to call f using parentheses and an argument. This is true for a function pointer, but it’s also true for a function object – any class that defines the appropriate operator( ). The following example shows several different ways this template can be expanded. First, we need a class that keeps track of its objects so we can know that it’s being properly destroyed:

//: C21:Counted.h
// An object that keeps track of itself
#ifndef COUNTED_H
#define COUNTED_H
#include <vector>
#include <iostream>

class Counted {
  static int count;
  char* ident;
public:
  Counted(char* id) : ident(id) { count++; }
  ~Counted() { 
    std::cout << ident << " count = " 
      << --count << std::endl;
  }
};

int Counted::count = 0;

class CountedVector : 
  public std::vector<Counted*> {
public:
  CountedVector(char* id) {
    for(int i = 0; i < 5; i++)
      push_back(new Counted(id));
  }
};

#endif // COUNTED_H ///:~

The class Counted keeps a static count of how many Counted objects have been created, and tells you as they are destroyed. In addition, each Counted keeps a char* identifier to make tracking the output easier.

The CountedVector is inherited from vector<Counted*>, and in the constructor it creates some Counted objects, handing each one your desired char*. The CountedVector makes testing quite simple, as you’ll see.

//: C21:ForEach.cpp
// Use of STL for_each() algorithm
#include "Counted.h"
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

// Simple function:
void destroy(Counted* fp) { delete fp; }

// Function object:
template<class T>
class DeleteT {
public:
  void operator()(T* x) { delete x; }
};

// Template function:
template <class T>
void wipe(T* x) { delete x; }

int main() {
  CountedVector A("one");
  for_each(A.begin(), A.end(), destroy);
  CountedVector B("two");
  for_each(B.begin(),B.end(),DeleteT<Counted>());
  CountedVector C("three");
  for_each(C.begin(), C.end(), wipe<Counted>);

} ///:~

In main( ), the first approach is the simple pointer-to-function, which works but has the drawback that you must write a new Destroy function for each different type. The obvious solution is to make a template, which is shown in the second approach with a templatized function object. On the other hand, approach three also makes sense: template functions work as well.

Since this is obviously something you might want to do a lot, why not create an algorithm to delete all the pointers in a container? This was accomplished with the purge( ) template created in the previous chapter. However, that used explicitly-written code; here, we could use transform( ). The value of transform( ) over for_each( ) is that transform( ) assigns the result of calling the function object into a resulting range, which can actually be the input range. That case means a literal transformation for the input range, since each element would be a modification of its previous value. In the above example this would be especially useful since it’s more appropriate to assign each pointer to the safe value of zero after calling delete for that pointer. Transform( ) can easily do this:

//: C21:Transform.cpp
// Use of STL transform() algorithm
#include "Counted.h"
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

template<class T>
T* deleteP(T* x) { delete x; return 0; }

template<class T> struct Deleter {
  T* operator()(T* x) { delete x; return 0; }
};

int main() {
  CountedVector cv("one");
  transform(cv.begin(), cv.end(), cv.begin(), 
    deleteP<Counted>);
  CountedVector cv2("two");
  transform(cv2.begin(), cv2.end(), cv2.begin(),
    Deleter<Counted>());

} ///:~

This shows both approaches: using a template function or a templatized function object. After the call to transform( ), the vector contains zero pointers, which is safer since any duplicate deletes will have no effect.

One thing you cannot do is delete every pointer in a collection without wrapping the call to delete inside a function or an object. That is, you don’t want to say something like this:

for_each(a.begin(), a.end(), ptr_fun(operator delete));

You can say it, but what you’ll get is a sequence of calls to the function that releases the storage. You will not get the effect of calling delete for each pointer in a, however; the destructor will not be called. This is typically not what you want, so you will need wrap your calls to delete.

In the previous example of for_each( ), the return value of the algorithm was ignored. This return value is the function that is passed in to for_each( ). If the function is just a pointer to a function, then the return value is not very useful, but if it is a function object, then that function object may have internal member data that it uses to accumulate information about all the objects that it sees during for_each( ).

For example, consider a simple model of inventory. Each Inventory object has the type of product it represents (here, single characters will be used for product names), the quantity of that product and the price of each item:

//: C21:Inventory.h
#ifndef INVENTORY_H
#define INVENTORY_H
#include <iostream>
#include <cstdlib>
#include <ctime>

class Inventory {
  char item;
  int quantity;
  int value;
public:
  Inventory(char it, int quant, int val) 
    : item(it), quantity(quant), value(val) {}
  // Synthesized operator= & copy-constructor OK
  char getItem() const { return item; }
  int getQuantity() const { return quantity; }
  void setQuantity(int q) { quantity = q; }
  int getValue() const { return value; }
  void setValue(int val) { value = val; }
  friend std::ostream& operator<<(
    std::ostream& os, const Inventory& inv) {
    return os << inv.item << ": " 
      << "quantity " << inv.quantity 
      << ", value " << inv.value;
  }
};

// A generator:
struct InvenGen {
  InvenGen() { std::srand(std::time(0)); }
  Inventory operator()() {
    static char c = 'a';
    int q = std::rand() % 100;
    int v = std::rand() % 500;
    return Inventory(c++, q, v);
  }
};

#endif // INVENTORY_H ///:~

There are member functions to get the item name, and to get and set quantity and value. An operator<< prints the Inventory object to an ostream. There’s also a generator that creates objects that have sequentially-labeled items and random quantities and values. Note the use of the return value optimization in operator( ).

To find out the total number of items and total value, you can create a function object to use with for_each( ) that has data members to hold the totals:

//: C21:CalcInventory.cpp
// More use of for_each()
#include "Inventory.h"
#include "PrintSequence.h"
#include <vector>
#include <algorithm>
using namespace std;

// To calculate inventory totals:
class InvAccum {
  int quantity;
  int value;
public:
  InvAccum() : quantity(0), value(0) {}
  void operator()(const Inventory& inv) {
    quantity += inv.getQuantity();
    value += inv.getQuantity() * inv.getValue();
  }
  friend ostream& 
  operator<<(ostream& os, const InvAccum& ia) {
    return os << "total quantity: " 
      << ia.quantity
      << ", total value: " << ia.value;
  }
};

int main() {
  vector<Inventory> vi;
  generate_n(back_inserter(vi), 15, InvenGen());
  print(vi, "vi");
  InvAccum ia = for_each(vi.begin(),vi.end(),
    InvAccum());
  cout << ia << endl;

} ///:~

InvAccum’s operator( ) takes a single argument, as required by for_each( ). As for_each( ) moves through its range, it takes each object in that range and passes it to InvAccum::operator( ), which performs calculations and saves the result. At the end of this process, for_each( ) returns the InvAccum object which you can then examine; in this case it is simply printed.

You can do most things to the Inventory objects using for_each( ). For example, if you wanted to increase all the prices by 10%, for_each( ) could do this handily. But you’ll notice that the Inventory objects have no way to change the item value. The programmers who designed Inventory thought this was a good idea, after all, why would you want to change the name of an item? But marketing has decided that they want a “new, improved” look by changing all the item names to uppercase; they’ve done studies and determined that the new names will boost sales (well, marketing has to have something to do ...). So for_each( ) will not work here, but transform( ) will:

//: C21:TransformNames.cpp
// More use of transform()
#include "Inventory.h"
#include "PrintSequence.h"
#include <vector>
#include <algorithm>
#include <cctype>
using namespace std;

struct NewImproved {
  Inventory operator()(const Inventory& inv) {
    return Inventory(toupper(inv.getItem()), 
      inv.getQuantity(), inv.getValue());
  }
};

int main() {
  vector<Inventory> vi;
  generate_n(back_inserter(vi), 15, InvenGen());
  print(vi, "vi");
  transform(vi.begin(), vi.end(), vi.begin(),
    NewImproved());
  print(vi, "vi");

} ///:~

Notice that the resulting range is the same as the input range, that is, the transformation is performed in-place.

Now suppose that the sales department needs to generate special price lists with different discounts for each item. The original list must stay the same, and there need to be any number of generated special lists. Sales will give you a separate list of discounts for each new list. To solve this problem we can use the second version of transform( ):

//: C21:SpecialList.cpp
// Using the second version of transform()
#include "Inventory.h"
#include "PrintSequence.h"
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <ctime>
using namespace std;

struct Discounter {
  Inventory operator()(const Inventory& inv,
    float discount) {
    return Inventory(inv.getItem(), 
      inv.getQuantity(), 
      inv.getValue() * (1 - discount));
  }
};

struct DiscGen {
  DiscGen() { srand(time(0)); }
  float operator()() {
    float r = float(rand() % 10);
    return r / 100.0;
  }
};

int main() {
  vector<Inventory> vi;
  generate_n(back_inserter(vi), 15, InvenGen());
  print(vi, "vi");
  vector<float> disc;
  generate_n(back_inserter(disc), 15, DiscGen());
  print(disc, "Discounts:");
  vector<Inventory> discounted;
  transform(vi.begin(),vi.end(), disc.begin(), 
    back_inserter(discounted), Discounter());
  print(discounted, "discounted");

} ///:~

Discounter is a function object that, given an Inventory object and a discount percentage, produces a new Inventory with the discounted price. DiscGen just generates random discount values between 1 and 10 percent to use for testing. In main( ), two vectors are created, one for Inventory and one for discounts. These are passed to transform( ) along with a Discounter object, and transform( ) fills a new vector<Inventory> called discounted.

Numeric algorithms

These algorithms are all tucked into the header <numeric>, since they are primarily useful for performing numerical calculations.

<numeric>

T accumulate(InputIterator first, InputIterator last, T result);

T accumulate(InputIterator first, InputIterator last, T result,

BinaryFunction f);

The first form is a generalized summation; for each element pointed to by an iterator i in [first, last) , it performs the operation result = result + *i , where result is of type T. However, the second form is more general; it applies the function f(result, *i) on each element *i in the range from beginning to end. The value result is initialized in both cases by resultI, and if the range is empty then resultI is returned.

Note the similarity between the second form of transform( ) and the second form of accumulate( ).

<numeric>

T inner_product(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, T init);

T inner_product(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, T init

BinaryFunction1 op1, BinaryFunction2 op2);

Calculates a generalized inner product of the two ranges [first1, last1) and [first2, first2 + (last1 - first1)) . The return value is produced by multiplying the element from the first sequence by the “parallel” element in the second sequence, and then adding it to the sum. So if you have two sequences {1, 1, 2, 2} and {1, 2, 3, 4} the inner product becomes:

(1*1) + (1*2) + (2*3) + (2*4)

Which is 17. The init argument is the initial value for the inner product; this is probably zero but may be anything and is especially important for an empty first sequence, because then it becomes the default return value. The second sequence must have at least as many elements as the first.

While the first form is very specifically mathematical, the second form is simply a multiple application of functions and could conceivably be used in many other situations. The op1 function is used in place of addition, and op2 is used instead of multiplication. Thus, if you applied the second version of inner_product( ) to the above sequence, the result would be the following operations:

init = op1(init, op2(1,1));
init = op1(init, op2(1,2));
init = op1(init, op2(2,3));

init = op1(init, op2(2,4));

Thus it’s similar to transform( ) but two operations are performed instead of one.

<numeric>

OutputIterator partial_sum(InputIterator first, InputIterator last,

OutputIterator result);

OutputIterator partial_sum(InputIterator first, InputIterator last,

OutputIterator result, BinaryFunction op);

Calculates a generalized partial sum. This means that a new sequence is created, beginning at result, where each element is the sum of all the elements up to the currently selected element in [first, last) . For example, if the original sequence is {1, 1, 2, 2, 3} then the generated sequence is {1, 1 + 1, 1 + 1 + 2, 1 + 1 + 1 + 2 + 2, 1 + 1 + 1 + 2 + 2 + 3} , that is, {1, 2, 4, 6, 9} .

In the second version, the binary function op is used instead of the + operator to take all the “summation” up to that point and combine it with the new value. For example, if you use multiplies<int>( ) as the object for the above sequence, the output is {1, 1, 2, 4, 12} . Note that the first output value is always the same as the first input value.

The return value is the end of the output range [result, result + (last - first) ) .

<numeric>

OutputIterator adjacent_difference(InputIterator first, InputIterator last,

OutputIterator result);

OutputIterator adjacent_difference(InputIterator first, InputIterator last,

OutputIterator result, BinaryFunction op);

Calculates the differences of adjacent elements throughout the range [first, last) . This means that in the new sequence, the value is the value of the difference of the current element and the previous element in the original sequence (the first value is the same). For example, if the original sequence is {1, 1, 2, 2, 3} , the resulting sequence is {1, 1 – 1, 2 – 1, 2 – 2, 3 – 2} , that is: {1, 0, 1, 0, 1} .

The second form uses the binary function op instead of the – operator to perform the “differencing.” For example, if you use multiplies<int>( ) as the function object for the above sequence, the output is {1, 1, 2, 4, 6} .

The return value is the end of the output range [result, result + (last - first) ) .

Example

This program tests all the algorithms in <numeric> in both forms, on integer arrays. You’ll notice that in the test of the form where you supply the function or functions, the function objects used are the ones that produce the same result as form one so the results produced will be exactly the same. This should also demonstrate a bit more clearly the operations that are going on, and how to substitute your own operations.

//: C21:NumericTest.cpp
#include "PrintSequence.h"
#include <numeric>
#include <algorithm>
#include <iostream>
#include <iterator>
#include <functional>
using namespace std;

int main() {
  int a[] = { 1, 1, 2, 2, 3, 5, 7, 9, 11, 13 };
  const int asz = sizeof a / sizeof a[0];
  print(a, a + asz, "a", " ");
  int r = accumulate(a, a + asz, 0);
  cout << "accumulate 1: " << r << endl;
  // Should produce the same result:
  r = accumulate(a, a + asz, 0, plus<int>());
  cout << "accumulate 2: " << r << endl;
  int b[] = { 1, 2, 3, 4, 1, 2, 3, 4, 1, 2 };
  print(b, b + sizeof b / sizeof b[0], "b", " ");
  r = inner_product(a, a + asz, b, 0);
  cout << "inner_product 1: " << r << endl;
  // Should produce the same result:
  r = inner_product(a, a + asz, b, 0, 
    plus<int>(), multiplies<int>());
  cout << "inner_product 2: " << r << endl;
  int* it = partial_sum(a, a + asz, b);
  print(b, it, "partial_sum 1", " ");
  // Should produce the same result:
  it = partial_sum(a, a + asz, b, plus<int>());
  print(b, it, "partial_sum 2", " ");
  it = adjacent_difference(a, a + asz, b);
  print(b, it, "adjacent_difference 1"," ");
  // Should produce the same result:
  it = adjacent_difference(a, a + asz, b, 
    minus<int>());
  print(b, it, "adjacent_difference 2"," ");

} ///:~

Note that the return value of inner_product( ) and partial_sum( ) is the past-the-end iterator for the resulting sequence, so it is used as the second iterator in the print( ) function.

Since the second form of each function allows you to provide your own function object, only the first form of the functions is purely “numeric.” You could conceivably do some things that are not intuitively numeric with something like inner_product( ).

General utilities

Finally, here are some basic tools that are used with the other algorithms; you may or may not use them directly yourself.

<utility>

struct pair;

make_pair( );

This was described and used in the previous chapter and in this one. A pair is simply a way to package two objects (which may be of different types) together into a single object. This is typically used when you need to return more than one object from a function, but it can also be used to create a container that holds pair objects, or to pass more than one object as a single argument. You access the elements by saying p.first and p.second, where p is the pair object. The function equal_range( ), described in the last chapter and in this one, returns its result as a pair of iterators. You can insert( ) a pair directly into a map or multimap; a pair is the value_type for those containers.

If you want to create a pair, you typically use the template function make_pair( ) rather than explicitly constructing a pair object.

<iterator>

distance(InputIterator first, InputIterator last);

Tells you the number of elements between first and last. More precisely, it returns an integral value that tells you the number of times first must be incremented before it is equal to last. No dereferencing of the iterators occurs during this process.

<iterator>

void advance(InputIterator& i, Distance n);

Moves the iterator i forward by the value of n (the iterator can also be moved backward for negative values of n if the iterator is also a bidirectional iterator). This algorithm is aware of bidirectional iterators, and will use the most efficient approach.

<iterator>

back_insert_iterator<Container> back_inserter(Container& x);

front_insert_iterator<Container> front_inserter(Container& x);

insert_iterator<Container> inserter(Container& x, Iterator i);

These functions are used to create iterators for the given containers that will insert elements into the container, rather than overwrite the existing elements in the container using operator= (which is the default behavior). Each type of iterator uses a different operation for insertion: back_insert_iterator uses push_back( ), front_insert_iterator uses push_front( ) and insert_iterator uses insert( ) (and thus it can be used with the associative containers, while the other two can be used with sequence containers). These were shown in some detail in the previous chapter, and also used in this chapter.

const LessThanComparable& min(const LessThanComparable& a,

const LessThanComparable& b);

const T& min(const T& a, const T& b, BinaryPredicate binary_pred);

Returns the lesser of its two arguments, or the first argument if the two are equivalent. The first version performs comparisons using operator< and the second passes both arguments to binary_pred to perform the comparison.

const LessThanComparable& max(const LessThanComparable& a,

const LessThanComparable& b);

const T& max(const T& a, const T& b, BinaryPredicate binary_pred);

Exactly like min( ), but returns the greater of its two arguments.

void swap(Assignable& a, Assignable& b);

void iter_swap(ForwardIterator1 a, ForwardIterator2 b);

Exchanges the values of a and b using assignment. Note that all container classes use specialized versions of swap( ) that are typically more efficient than this general version.

iter_swap( ) is a backwards-compatible remnant in the standard; you can just use swap( ).

Contents | Prev | Next

Contact: webmaster@codeguru.com
CodeGuru - the website for developers. [an error occurred while processing this directive]