1  """Heap queue algorithm (a.k.a. priority queue).
       2  
       3  Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
       4  all k, counting elements from 0.  For the sake of comparison,
       5  non-existing elements are considered to be infinite.  The interesting
       6  property of a heap is that a[0] is always its smallest element.
       7  
       8  Usage:
       9  
      10  heap = []            # creates an empty heap
      11  heappush(heap, item) # pushes a new item on the heap
      12  item = heappop(heap) # pops the smallest item from the heap
      13  item = heap[0]       # smallest item on the heap without popping it
      14  heapify(x)           # transforms list into a heap, in-place, in linear time
      15  item = heappushpop(heap, item) # pushes a new item and then returns
      16                                 # the smallest item; the heap size is unchanged
      17  item = heapreplace(heap, item) # pops and returns smallest item, and adds
      18                                 # new item; the heap size is unchanged
      19  
      20  Our API differs from textbook heap algorithms as follows:
      21  
      22  - We use 0-based indexing.  This makes the relationship between the
      23    index for a node and the indexes for its children slightly less
      24    obvious, but is more suitable since Python uses 0-based indexing.
      25  
      26  - Our heappop() method returns the smallest item, not the largest.
      27  
      28  These two make it possible to view the heap as a regular Python list
      29  without surprises: heap[0] is the smallest item, and heap.sort()
      30  maintains the heap invariant!
      31  """
      32  
      33  # Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
      34  
      35  __about__ = """Heap queues
      36  
      37  [explanation by François Pinard]
      38  
      39  Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
      40  all k, counting elements from 0.  For the sake of comparison,
      41  non-existing elements are considered to be infinite.  The interesting
      42  property of a heap is that a[0] is always its smallest element.
      43  
      44  The strange invariant above is meant to be an efficient memory
      45  representation for a tournament.  The numbers below are `k', not a[k]:
      46  
      47                                     0
      48  
      49                    1                                 2
      50  
      51            3               4                5               6
      52  
      53        7       8       9       10      11      12      13      14
      54  
      55      15 16   17 18   19 20   21 22   23 24   25 26   27 28   29 30
      56  
      57  
      58  In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'.  In
      59  a usual binary tournament we see in sports, each cell is the winner
      60  over the two cells it tops, and we can trace the winner down the tree
      61  to see all opponents s/he had.  However, in many computer applications
      62  of such tournaments, we do not need to trace the history of a winner.
      63  To be more memory efficient, when a winner is promoted, we try to
      64  replace it by something else at a lower level, and the rule becomes
      65  that a cell and the two cells it tops contain three different items,
      66  but the top cell "wins" over the two topped cells.
      67  
      68  If this heap invariant is protected at all time, index 0 is clearly
      69  the overall winner.  The simplest algorithmic way to remove it and
      70  find the "next" winner is to move some loser (let's say cell 30 in the
      71  diagram above) into the 0 position, and then percolate this new 0 down
      72  the tree, exchanging values, until the invariant is re-established.
      73  This is clearly logarithmic on the total number of items in the tree.
      74  By iterating over all items, you get an O(n ln n) sort.
      75  
      76  A nice feature of this sort is that you can efficiently insert new
      77  items while the sort is going on, provided that the inserted items are
      78  not "better" than the last 0'th element you extracted.  This is
      79  especially useful in simulation contexts, where the tree holds all
      80  incoming events, and the "win" condition means the smallest scheduled
      81  time.  When an event schedule other events for execution, they are
      82  scheduled into the future, so they can easily go into the heap.  So, a
      83  heap is a good structure for implementing schedulers (this is what I
      84  used for my MIDI sequencer :-).
      85  
      86  Various structures for implementing schedulers have been extensively
      87  studied, and heaps are good for this, as they are reasonably speedy,
      88  the speed is almost constant, and the worst case is not much different
      89  than the average case.  However, there are other representations which
      90  are more efficient overall, yet the worst cases might be terrible.
      91  
      92  Heaps are also very useful in big disk sorts.  You most probably all
      93  know that a big sort implies producing "runs" (which are pre-sorted
      94  sequences, which size is usually related to the amount of CPU memory),
      95  followed by a merging passes for these runs, which merging is often
      96  very cleverly organised[1].  It is very important that the initial
      97  sort produces the longest runs possible.  Tournaments are a good way
      98  to that.  If, using all the memory available to hold a tournament, you
      99  replace and percolate items that happen to fit the current run, you'll
     100  produce runs which are twice the size of the memory for random input,
     101  and much better for input fuzzily ordered.
     102  
     103  Moreover, if you output the 0'th item on disk and get an input which
     104  may not fit in the current tournament (because the value "wins" over
     105  the last output value), it cannot fit in the heap, so the size of the
     106  heap decreases.  The freed memory could be cleverly reused immediately
     107  for progressively building a second heap, which grows at exactly the
     108  same rate the first heap is melting.  When the first heap completely
     109  vanishes, you switch heaps and start a new run.  Clever and quite
     110  effective!
     111  
     112  In a word, heaps are useful memory structures to know.  I use them in
     113  a few applications, and I think it is good to keep a `heap' module
     114  around. :-)
     115  
     116  --------------------
     117  [1] The disk balancing algorithms which are current, nowadays, are
     118  more annoying than clever, and this is a consequence of the seeking
     119  capabilities of the disks.  On devices which cannot seek, like big
     120  tape drives, the story was quite different, and one had to be very
     121  clever to ensure (far in advance) that each tape movement will be the
     122  most effective possible (that is, will best participate at
     123  "progressing" the merge).  Some tapes were even able to read
     124  backwards, and this was also used to avoid the rewinding time.
     125  Believe me, real good tape sorts were quite spectacular to watch!
     126  From all times, sorting has always been a Great Art! :-)
     127  """
     128  
     129  __all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge',
     130             'nlargest', 'nsmallest', 'heappushpop']
     131  
     132  def heappush(heap, item):
     133      """Push item onto heap, maintaining the heap invariant."""
     134      heap.append(item)
     135      _siftdown(heap, 0, len(heap)-1)
     136  
     137  def heappop(heap):
     138      """Pop the smallest item off the heap, maintaining the heap invariant."""
     139      lastelt = heap.pop()    # raises appropriate IndexError if heap is empty
     140      if heap:
     141          returnitem = heap[0]
     142          heap[0] = lastelt
     143          _siftup(heap, 0)
     144          return returnitem
     145      return lastelt
     146  
     147  def heapreplace(heap, item):
     148      """Pop and return the current smallest value, and add the new item.
     149  
     150      This is more efficient than heappop() followed by heappush(), and can be
     151      more appropriate when using a fixed-size heap.  Note that the value
     152      returned may be larger than item!  That constrains reasonable uses of
     153      this routine unless written as part of a conditional replacement:
     154  
     155          if item > heap[0]:
     156              item = heapreplace(heap, item)
     157      """
     158      returnitem = heap[0]    # raises appropriate IndexError if heap is empty
     159      heap[0] = item
     160      _siftup(heap, 0)
     161      return returnitem
     162  
     163  def heappushpop(heap, item):
     164      """Fast version of a heappush followed by a heappop."""
     165      if heap and heap[0] < item:
     166          item, heap[0] = heap[0], item
     167          _siftup(heap, 0)
     168      return item
     169  
     170  def heapify(x):
     171      """Transform list into a heap, in-place, in O(len(x)) time."""
     172      n = len(x)
     173      # Transform bottom-up.  The largest index there's any point to looking at
     174      # is the largest with a child index in-range, so must have 2*i + 1 < n,
     175      # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
     176      # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
     177      # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
     178      for i in reversed(range(n//2)):
     179          _siftup(x, i)
     180  
     181  def _heappop_max(heap):
     182      """Maxheap version of a heappop."""
     183      lastelt = heap.pop()    # raises appropriate IndexError if heap is empty
     184      if heap:
     185          returnitem = heap[0]
     186          heap[0] = lastelt
     187          _siftup_max(heap, 0)
     188          return returnitem
     189      return lastelt
     190  
     191  def _heapreplace_max(heap, item):
     192      """Maxheap version of a heappop followed by a heappush."""
     193      returnitem = heap[0]    # raises appropriate IndexError if heap is empty
     194      heap[0] = item
     195      _siftup_max(heap, 0)
     196      return returnitem
     197  
     198  def _heapify_max(x):
     199      """Transform list into a maxheap, in-place, in O(len(x)) time."""
     200      n = len(x)
     201      for i in reversed(range(n//2)):
     202          _siftup_max(x, i)
     203  
     204  # 'heap' is a heap at all indices >= startpos, except possibly for pos.  pos
     205  # is the index of a leaf with a possibly out-of-order value.  Restore the
     206  # heap invariant.
     207  def _siftdown(heap, startpos, pos):
     208      newitem = heap[pos]
     209      # Follow the path to the root, moving parents down until finding a place
     210      # newitem fits.
     211      while pos > startpos:
     212          parentpos = (pos - 1) >> 1
     213          parent = heap[parentpos]
     214          if newitem < parent:
     215              heap[pos] = parent
     216              pos = parentpos
     217              continue
     218          break
     219      heap[pos] = newitem
     220  
     221  # The child indices of heap index pos are already heaps, and we want to make
     222  # a heap at index pos too.  We do this by bubbling the smaller child of
     223  # pos up (and so on with that child's children, etc) until hitting a leaf,
     224  # then using _siftdown to move the oddball originally at index pos into place.
     225  #
     226  # We *could* break out of the loop as soon as we find a pos where newitem <=
     227  # both its children, but turns out that's not a good idea, and despite that
     228  # many books write the algorithm that way.  During a heap pop, the last array
     229  # element is sifted in, and that tends to be large, so that comparing it
     230  # against values starting from the root usually doesn't pay (= usually doesn't
     231  # get us out of the loop early).  See Knuth, Volume 3, where this is
     232  # explained and quantified in an exercise.
     233  #
     234  # Cutting the # of comparisons is important, since these routines have no
     235  # way to extract "the priority" from an array element, so that intelligence
     236  # is likely to be hiding in custom comparison methods, or in array elements
     237  # storing (priority, record) tuples.  Comparisons are thus potentially
     238  # expensive.
     239  #
     240  # On random arrays of length 1000, making this change cut the number of
     241  # comparisons made by heapify() a little, and those made by exhaustive
     242  # heappop() a lot, in accord with theory.  Here are typical results from 3
     243  # runs (3 just to demonstrate how small the variance is):
     244  #
     245  # Compares needed by heapify     Compares needed by 1000 heappops
     246  # --------------------------     --------------------------------
     247  # 1837 cut to 1663               14996 cut to 8680
     248  # 1855 cut to 1659               14966 cut to 8678
     249  # 1847 cut to 1660               15024 cut to 8703
     250  #
     251  # Building the heap by using heappush() 1000 times instead required
     252  # 2198, 2148, and 2219 compares:  heapify() is more efficient, when
     253  # you can use it.
     254  #
     255  # The total compares needed by list.sort() on the same lists were 8627,
     256  # 8627, and 8632 (this should be compared to the sum of heapify() and
     257  # heappop() compares):  list.sort() is (unsurprisingly!) more efficient
     258  # for sorting.
     259  
     260  def _siftup(heap, pos):
     261      endpos = len(heap)
     262      startpos = pos
     263      newitem = heap[pos]
     264      # Bubble up the smaller child until hitting a leaf.
     265      childpos = 2*pos + 1    # leftmost child position
     266      while childpos < endpos:
     267          # Set childpos to index of smaller child.
     268          rightpos = childpos + 1
     269          if rightpos < endpos and not heap[childpos] < heap[rightpos]:
     270              childpos = rightpos
     271          # Move the smaller child up.
     272          heap[pos] = heap[childpos]
     273          pos = childpos
     274          childpos = 2*pos + 1
     275      # The leaf at pos is empty now.  Put newitem there, and bubble it up
     276      # to its final resting place (by sifting its parents down).
     277      heap[pos] = newitem
     278      _siftdown(heap, startpos, pos)
     279  
     280  def _siftdown_max(heap, startpos, pos):
     281      'Maxheap variant of _siftdown'
     282      newitem = heap[pos]
     283      # Follow the path to the root, moving parents down until finding a place
     284      # newitem fits.
     285      while pos > startpos:
     286          parentpos = (pos - 1) >> 1
     287          parent = heap[parentpos]
     288          if parent < newitem:
     289              heap[pos] = parent
     290              pos = parentpos
     291              continue
     292          break
     293      heap[pos] = newitem
     294  
     295  def _siftup_max(heap, pos):
     296      'Maxheap variant of _siftup'
     297      endpos = len(heap)
     298      startpos = pos
     299      newitem = heap[pos]
     300      # Bubble up the larger child until hitting a leaf.
     301      childpos = 2*pos + 1    # leftmost child position
     302      while childpos < endpos:
     303          # Set childpos to index of larger child.
     304          rightpos = childpos + 1
     305          if rightpos < endpos and not heap[rightpos] < heap[childpos]:
     306              childpos = rightpos
     307          # Move the larger child up.
     308          heap[pos] = heap[childpos]
     309          pos = childpos
     310          childpos = 2*pos + 1
     311      # The leaf at pos is empty now.  Put newitem there, and bubble it up
     312      # to its final resting place (by sifting its parents down).
     313      heap[pos] = newitem
     314      _siftdown_max(heap, startpos, pos)
     315  
     316  def merge(*iterables, key=None, reverse=False):
     317      '''Merge multiple sorted inputs into a single sorted output.
     318  
     319      Similar to sorted(itertools.chain(*iterables)) but returns a generator,
     320      does not pull the data into memory all at once, and assumes that each of
     321      the input streams is already sorted (smallest to largest).
     322  
     323      >>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
     324      [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
     325  
     326      If *key* is not None, applies a key function to each element to determine
     327      its sort order.
     328  
     329      >>> list(merge(['dog', 'horse'], ['cat', 'fish', 'kangaroo'], key=len))
     330      ['dog', 'cat', 'fish', 'horse', 'kangaroo']
     331  
     332      '''
     333  
     334      h = []
     335      h_append = h.append
     336  
     337      if reverse:
     338          _heapify = _heapify_max
     339          _heappop = _heappop_max
     340          _heapreplace = _heapreplace_max
     341          direction = -1
     342      else:
     343          _heapify = heapify
     344          _heappop = heappop
     345          _heapreplace = heapreplace
     346          direction = 1
     347  
     348      if key is None:
     349          for order, it in enumerate(map(iter, iterables)):
     350              try:
     351                  next = it.__next__
     352                  h_append([next(), order * direction, next])
     353              except StopIteration:
     354                  pass
     355          _heapify(h)
     356          while len(h) > 1:
     357              try:
     358                  while True:
     359                      value, order, next = s = h[0]
     360                      yield value
     361                      s[0] = next()           # raises StopIteration when exhausted
     362                      _heapreplace(h, s)      # restore heap condition
     363              except StopIteration:
     364                  _heappop(h)                 # remove empty iterator
     365          if h:
     366              # fast case when only a single iterator remains
     367              value, order, next = h[0]
     368              yield value
     369              yield from next.__self__
     370          return
     371  
     372      for order, it in enumerate(map(iter, iterables)):
     373          try:
     374              next = it.__next__
     375              value = next()
     376              h_append([key(value), order * direction, value, next])
     377          except StopIteration:
     378              pass
     379      _heapify(h)
     380      while len(h) > 1:
     381          try:
     382              while True:
     383                  key_value, order, value, next = s = h[0]
     384                  yield value
     385                  value = next()
     386                  s[0] = key(value)
     387                  s[2] = value
     388                  _heapreplace(h, s)
     389          except StopIteration:
     390              _heappop(h)
     391      if h:
     392          key_value, order, value, next = h[0]
     393          yield value
     394          yield from next.__self__
     395  
     396  
     397  # Algorithm notes for nlargest() and nsmallest()
     398  # ==============================================
     399  #
     400  # Make a single pass over the data while keeping the k most extreme values
     401  # in a heap.  Memory consumption is limited to keeping k values in a list.
     402  #
     403  # Measured performance for random inputs:
     404  #
     405  #                                   number of comparisons
     406  #    n inputs     k-extreme values  (average of 5 trials)   % more than min()
     407  # -------------   ----------------  ---------------------   -----------------
     408  #      1,000           100                  3,317               231.7%
     409  #     10,000           100                 14,046                40.5%
     410  #    100,000           100                105,749                 5.7%
     411  #  1,000,000           100              1,007,751                 0.8%
     412  # 10,000,000           100             10,009,401                 0.1%
     413  #
     414  # Theoretical number of comparisons for k smallest of n random inputs:
     415  #
     416  # Step   Comparisons                  Action
     417  # ----   --------------------------   ---------------------------
     418  #  1     1.66 * k                     heapify the first k-inputs
     419  #  2     n - k                        compare remaining elements to top of heap
     420  #  3     k * (1 + lg2(k)) * ln(n/k)   replace the topmost value on the heap
     421  #  4     k * lg2(k) - (k/2)           final sort of the k most extreme values
     422  #
     423  # Combining and simplifying for a rough estimate gives:
     424  #
     425  #        comparisons = n + k * (log(k, 2) * log(n/k) + log(k, 2) + log(n/k))
     426  #
     427  # Computing the number of comparisons for step 3:
     428  # -----------------------------------------------
     429  # * For the i-th new value from the iterable, the probability of being in the
     430  #   k most extreme values is k/i.  For example, the probability of the 101st
     431  #   value seen being in the 100 most extreme values is 100/101.
     432  # * If the value is a new extreme value, the cost of inserting it into the
     433  #   heap is 1 + log(k, 2).
     434  # * The probability times the cost gives:
     435  #            (k/i) * (1 + log(k, 2))
     436  # * Summing across the remaining n-k elements gives:
     437  #            sum((k/i) * (1 + log(k, 2)) for i in range(k+1, n+1))
     438  # * This reduces to:
     439  #            (H(n) - H(k)) * k * (1 + log(k, 2))
     440  # * Where H(n) is the n-th harmonic number estimated by:
     441  #            gamma = 0.5772156649
     442  #            H(n) = log(n, e) + gamma + 1 / (2 * n)
     443  #   http://en.wikipedia.org/wiki/Harmonic_series_(mathematics)#Rate_of_divergence
     444  # * Substituting the H(n) formula:
     445  #            comparisons = k * (1 + log(k, 2)) * (log(n/k, e) + (1/n - 1/k) / 2)
     446  #
     447  # Worst-case for step 3:
     448  # ----------------------
     449  # In the worst case, the input data is reversed sorted so that every new element
     450  # must be inserted in the heap:
     451  #
     452  #             comparisons = 1.66 * k + log(k, 2) * (n - k)
     453  #
     454  # Alternative Algorithms
     455  # ----------------------
     456  # Other algorithms were not used because they:
     457  # 1) Took much more auxiliary memory,
     458  # 2) Made multiple passes over the data.
     459  # 3) Made more comparisons in common cases (small k, large n, semi-random input).
     460  # See the more detailed comparison of approach at:
     461  # http://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest
     462  
     463  def nsmallest(n, iterable, key=None):
     464      """Find the n smallest elements in a dataset.
     465  
     466      Equivalent to:  sorted(iterable, key=key)[:n]
     467      """
     468  
     469      # Short-cut for n==1 is to use min()
     470      if n == 1:
     471          it = iter(iterable)
     472          sentinel = object()
     473          result = min(it, default=sentinel, key=key)
     474          return [] if result is sentinel else [result]
     475  
     476      # When n>=size, it's faster to use sorted()
     477      try:
     478          size = len(iterable)
     479      except (TypeError, AttributeError):
     480          pass
     481      else:
     482          if n >= size:
     483              return sorted(iterable, key=key)[:n]
     484  
     485      # When key is none, use simpler decoration
     486      if key is None:
     487          it = iter(iterable)
     488          # put the range(n) first so that zip() doesn't
     489          # consume one too many elements from the iterator
     490          result = [(elem, i) for i, elem in zip(range(n), it)]
     491          if not result:
     492              return result
     493          _heapify_max(result)
     494          top = result[0][0]
     495          order = n
     496          _heapreplace = _heapreplace_max
     497          for elem in it:
     498              if elem < top:
     499                  _heapreplace(result, (elem, order))
     500                  top, _order = result[0]
     501                  order += 1
     502          result.sort()
     503          return [elem for (elem, order) in result]
     504  
     505      # General case, slowest method
     506      it = iter(iterable)
     507      result = [(key(elem), i, elem) for i, elem in zip(range(n), it)]
     508      if not result:
     509          return result
     510      _heapify_max(result)
     511      top = result[0][0]
     512      order = n
     513      _heapreplace = _heapreplace_max
     514      for elem in it:
     515          k = key(elem)
     516          if k < top:
     517              _heapreplace(result, (k, order, elem))
     518              top, _order, _elem = result[0]
     519              order += 1
     520      result.sort()
     521      return [elem for (k, order, elem) in result]
     522  
     523  def nlargest(n, iterable, key=None):
     524      """Find the n largest elements in a dataset.
     525  
     526      Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]
     527      """
     528  
     529      # Short-cut for n==1 is to use max()
     530      if n == 1:
     531          it = iter(iterable)
     532          sentinel = object()
     533          result = max(it, default=sentinel, key=key)
     534          return [] if result is sentinel else [result]
     535  
     536      # When n>=size, it's faster to use sorted()
     537      try:
     538          size = len(iterable)
     539      except (TypeError, AttributeError):
     540          pass
     541      else:
     542          if n >= size:
     543              return sorted(iterable, key=key, reverse=True)[:n]
     544  
     545      # When key is none, use simpler decoration
     546      if key is None:
     547          it = iter(iterable)
     548          result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)]
     549          if not result:
     550              return result
     551          heapify(result)
     552          top = result[0][0]
     553          order = -n
     554          _heapreplace = heapreplace
     555          for elem in it:
     556              if top < elem:
     557                  _heapreplace(result, (elem, order))
     558                  top, _order = result[0]
     559                  order -= 1
     560          result.sort(reverse=True)
     561          return [elem for (elem, order) in result]
     562  
     563      # General case, slowest method
     564      it = iter(iterable)
     565      result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
     566      if not result:
     567          return result
     568      heapify(result)
     569      top = result[0][0]
     570      order = -n
     571      _heapreplace = heapreplace
     572      for elem in it:
     573          k = key(elem)
     574          if top < k:
     575              _heapreplace(result, (k, order, elem))
     576              top, _order, _elem = result[0]
     577              order -= 1
     578      result.sort(reverse=True)
     579      return [elem for (k, order, elem) in result]
     580  
     581  # If available, use C implementation
     582  try:
     583      from _heapq import *
     584  except ImportError:
     585      pass
     586  try:
     587      from _heapq import _heapreplace_max
     588  except ImportError:
     589      pass
     590  try:
     591      from _heapq import _heapify_max
     592  except ImportError:
     593      pass
     594  try:
     595      from _heapq import _heappop_max
     596  except ImportError:
     597      pass
     598  
     599  
     600  if __name__ == "__main__":
     601  
     602      import doctest # pragma: no cover
     603      print(doctest.testmod()) # pragma: no cover