难懂的数据结构-堆

背景

日常工作经常会用到定时任务，Java 提供了 ScheduledThreadPoolExecutor 方便开发者使用。最近在翻看源码时，了解到底层用到了 DelayedWorkQueue，其实就是阻塞式的优先级队列。之前了解过但没掌握，我工作快 8 年会有些丢脸，决定现在把这个捡起来。

前提知识

二叉树

二叉树是一种数据结构，由节点组成，每个节点最多有两个子节点，一个称为左子节点，另一个称为右子节点。

完全二叉树

完全二叉树是一种二叉树，除了最后一层外，每一层的节点都按照从左到右的顺序填满，且最后一层的节点靠左排列。

上节二叉树的示例便不是一棵完全二叉树。

基于完全二叉树的节点规律，我们可以用数组存储节点数据，从而很方便访问树中的各个节点。

设当前节点下标为 k，则有；

父节点下标为 (k-1)/2
左子节点下标为 2*k + 1
右子节点下标为 2*k + 2

二叉堆

完全二叉堆是一种完全二叉树结构，其中每个父节点的键值要么大于等于（大根堆），要么小于等于（小根堆）其子节点的键值，用于实现优先队列等数据结构。

堆操作

入堆

先将节点插入到堆的最末位置；
比较当前节点与父节点优先级，如果优先级不高于父节点，或者已经是根节点，入堆结束；
如果优先级比父节点高，则交换当前节点与父节点位置，并在新的位置重新执行第二步；

这个过程通常也叫做 sift-up（上浮）。

出堆

先将根节点弹出，然后将堆的最末节点移动到根节点；
比较左右子节点优先级选择较高的节点，如果当前优先级不低于该子节点，出堆结束；
否则交换当前节点与该子节点位置，并在新的位置重新执行第二步；

这个过程通常也叫做 sift-down（下沉）。

建堆

从最后一个非叶子节点开始，向前逐个遍历；
对每个节点执行 sift-down 下沉操作；
重复步骤 2，直到根节点；

根据完全二叉树的规律，最后一个非叶子节点的下标为 (n/2 - 1)，n 为堆中的节点数。

应用

优先级队列

回到文章的开头，talk is cheap，用上述的知识编写一个简易的优先级队列；

java

/**
 * 优先级队列
 *
 * @param <T> 节点类型
 */
public class PriorityQueue<T> {

    /**
     * 存放节点的数组
     */
    private T[] nodes;
    /**
     * 堆中节点的数量
     */
    private int size;
    /**
     * 优先级比较器
     */
    private final Comparator<T> comparator;

    private static final int DEFAULT_CAPACITY = 2 << 4;

    public PriorityQueue(Comparator<T> comparator) {
        this.nodes = (T[]) new Object[DEFAULT_CAPACITY];
        this.comparator = comparator;
        this.size = 0;
    }

    public PriorityQueue(T[] nodes, Comparator<T> comparator) {
        this.nodes = nodes;
        this.comparator = comparator;
        this.size = nodes.length;
        this.init();
        this.grow();
    }

    public void push(T node) {
        nodes[size++] = node;
        siftUp(size - 1);
        if (size == nodes.length) {
            // 当数组空间已完全使用执行扩容
            grow();
        }
    }

    public T pop() {
        if (size == 0) {
            return null;
        }
        T node = nodes[0];
        swap(0, size-- - 1);
        siftDown(0);
        return node;
    }

    /**
     * 上浮
     */
    private void siftUp(int index) {
        int parentIndex = (index - 1) / 2;
        if (index > 0 && isHigher(index, parentIndex)) {
            swap(index, parentIndex);
            siftUp(parentIndex);
        }
    }

    /**
     * 下沉
     */
    private void siftDown(int index) {
        int leftIndex = 2 * index + 1;
        int rightIndex = 2 * index + 2;
        int highestIndex = index;
        if (leftIndex < size && isHigher(leftIndex, highestIndex)) {
            highestIndex = leftIndex;
        }
        if (rightIndex < size && isHigher(rightIndex, highestIndex)) {
            highestIndex = rightIndex;
        }
        if (index != highestIndex) {
            swap(index, highestIndex);
            siftDown(highestIndex);
        }
    }

    /**
     * 初始化建堆
     */
    private void init() {
        for (int index = size / 2 - 1; index >= 0; index--) {
            siftDown(index);
        }
    }

    private boolean isHigher(int i, int j) {
        return comparator.compare(nodes[i], nodes[j]) <= 0;
    }

    private void swap(int i, int j) {
        T node = nodes[i];
        nodes[i] = nodes[j];
        nodes[j] = node;
    }

    private void grow() {
        this.nodes = Arrays.copyOf(nodes, nodes.length + (nodes.length >> 1));
    }

    public boolean isEmpty() {
        return size == 0;
    }
}

堆排序

个人觉得非常难写的排序算法，不过从今儿起我悟了；

java

public void sortByHeap(int[] nums) {
    // 初始化建堆
    for (int index = nums.length / 2 - 1; index >= 0; index--) {
        siftDown(nums, index, nums.length);
    }
    // 根节点与堆尾节点交换，然后堆节点数减1，从新的根节点执行下沉
    for(int index = nums.length - 1; index > 0; index--) {
        swap(nums, 0, index);
        siftDown(nums, 0, index);
    }
}

/**
    * 下沉
    */
private void siftDown(int[] nums, int index, int size) {
    int leftIndex = 2 * index + 1;
    int rightIndex = 2 * index + 2;
    int highestIndex = index;
    if (leftIndex < size && nums[leftIndex] > nums[highestIndex]) {
        highestIndex = leftIndex;
    }
    if (rightIndex < size && nums[rightIndex] > nums[highestIndex]) {
        highestIndex = rightIndex;
    }
    if (index != highestIndex) {
        swap(nums, index, highestIndex);
        siftDown(nums, highestIndex, size);
    }
}

private void swap(int[] nums, int i, int j) {
    int n = nums[i];
    nums[i] = nums[j];
    nums[j] = n;
}

需要注意的是先建好大根堆，然后根节点出堆（与堆末节点交换），以新的堆根节点执行下沉，保证在原数组上操作排序。

347. 前 K 个高频元素

非常经典的面试题，这里就不再手写堆了，直接利用 JDK 实现；

java

class Solution {
    public int[] topKFrequent(int[] nums, int k) {
        Map<Integer, Integer> map = new HashMap<>();
        for (int n : nums) {
            // 频次统计
            map.put(n, map.getOrDefault(n, 0) + 1);
        }
        // 小根堆，得到出现频次最高的 k 个数
        PriorityQueue<Integer> heap = new PriorityQueue<>(k, Comparator.comparingInt(map::get));
        for (Integer n : map.keySet()) {
            if (heap.size() < k) {
                heap.add(n);
            } else if (map.get(heap.peek()) < map.get(n)) {
                heap.poll();
                heap.add(n);
            }
        }
        return Arrays.stream(heap.toArray(new Integer[0])).mapToInt(item -> item).toArray();
    }
}

参考

算法可视化

背景 ​

前提知识 ​

二叉树 ​

完全二叉树 ​

二叉堆 ​

堆操作 ​

入堆 ​

出堆 ​

建堆 ​

应用 ​

优先级队列 ​

堆排序 ​

347. 前 K 个高频元素 ​

参考 ​

背景

前提知识

二叉树

完全二叉树

二叉堆

堆操作

入堆

出堆

建堆

应用

优先级队列

堆排序

347. 前 K 个高频元素

参考