SparseArray以及ArrayMap的一些研究

2018-03-08

SparseArray是Android中为这样的HashMap专门写的class，目的是提高效率，其核心是折半查找函数（binarySearch）

SparseArray的使用

创建

1 2	SparseArray sparseArray = new SparseArray(); SparseArray sparseArray = new SparseArray(capacity);

SparseArray有两个构造方法，一个默认构造方法，一个传入容量

put()

1	public void put(int key, E value)

存放的健值对中的健是int数据

get()

1 2	public E get(int key) public E get(int key, E valueIfKeyNotFound)

remove()

1	public void remove(int key)

SparseArray的原理

首先来看一下Google官方的介绍：

SparseArrays map integers to Objects. Unlike a normal array of Objects,there can be gaps in the indices. It is intended to be more memory efficient than using a HashMap to map Integers to Objects, both because it avoids auto-boxing keys and its data structure doesn’t rely on an extra entry object for each mapping.

SparseArray使用int[]数组存放key，避免了HashMap中基本数据类型需要装箱的步骤，其次不使用额外的结构体Entry，单个元素的存储成本下降。

初始化

首先来看它的构造函数：

/**
 * Creates a new SparseArray containing no mappings.
 */
public SparseArray() {
    this(10);
}
/**
 * Creates a new SparseArray containing no mappings that will not
 * require any additional memory allocation to store the specified
 * number of mappings.  If you supply an initial capacity of 0, the
 * sparse array will be initialized with a light-weight representation
 * not requiring any additional array allocations.
 */
public SparseArray(int initialCapacity) {
    if (initialCapacity == 0) {
      	//如果容量为0，获取两个长度为0的数组
        mKeys = EmptyArray.INT;
        mValues = EmptyArray.OBJECT;
    } else {
      	//创建两个长度相同的数组，一个存放key，一个存放values
        mValues = ArrayUtils.newUnpaddedObjectArray(initialCapacity);
        mKeys = new int[mValues.length];
    }
    mSize = 0;
}

可以看到，初始化只是简单的创建了两个数组。

put()

/**
 * Adds a mapping from the specified key to the specified value,
 * replacing the previous mapping from the specified key if there
 * was one.
 */
public void put(int key, E value) {
  	//首先通过二分查找去key数组中查找要插入的key，返回索引
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
    if (i >= 0) {
      	//如果索引 i>=0 说明数组中已经有了该key，则直接覆盖原来的值
        mValues[i] = value;
    } else {
      	//取反，这里得到的i应该是最合适key插入的位置
        i = ~i;
		//如果索引限于当前已经存放的长度，并且这个位置上的值为DELETED（即被标记为删除掉的值）
        if (i < mSize && mValues[i] == DELETED) {
          	//直接赋值返回，并且size不需要增加
            mKeys[i] = key;
            mValues[i] = value;
            return;
        }
		//到这里说明直接赋值失败，检查当前是否被标记为回收，并且存放的长度已经大于等于数组长度
        if (mGarbage && mSize >= mKeys.length) {
          	//回收数组中应该被删除的值
            gc();
            // Search again because indices may have changed.
          	//重新搜索一次，因为数组可能发生了变化
            i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
        }
			//最终在i位置上插入键与值，并且size+1
        mKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key);
        mValues = GrowingArrayUtils.insert(mValues, mSize, i, value);
        mSize++;
    }
}

插入大体有以下几个特点：

存放key的数组是有序的（满足二分查找）
如果冲突，新值直接覆盖原值，并不会返回原值（HaspMap会返回原值）
如果单前要插入的key的索引上的值为DELETE，则直接覆盖
如果前几步都失败了，检查是否需要gc()并在该索引上插入数据

上述逻辑可以用图来表示：

sparse_array_put_one

上面这个图，插入一个key=3的元素，因为在mKeys中已经存在这个值，因此直接覆盖

sparse_array_put_two

如图所示，mKeys中并没有3这个值，但是通过二分查找得出目前应该插入的索引的位置为2，即key = 4所在的位置，而这个位置上的value被标记为了DELETE，所以会直接讲该位置上的key 赋值为3，并将该位置上的value赋值为put()传入的对象。

sparse_array_put_three

该图的数组容量已经满了，而且3应该插入的位置已经有4了，而5所指向的值为DELETE，这种情况下会先去回收DELETE，重新调整数据结构，图中的例子会回收5然后重新计算3应该插入的位置

sparse_array_put_four

最后一种情况，数组满了并且没有3这个值，就只能对数组进行扩容，然后插入数据

接下来还有几个值得探索的点，首先是二分查找的算法，这是一个很普通的二分查找算法

// This is Arrays.binarySearch(), but doesn't do any argument validation.
static int binarySearch(int[] array, int size, int value) {
    int lo = 0;
    int hi = size - 1;
    while (lo <= hi) {
        final int mid = (lo + hi) >>> 1;
        final int midVal = array[mid];
        if (midVal < value) {
            lo = mid + 1;
        } else if (midVal > value) {
            hi = mid - 1;
        } else {
            return mid;  // value found
        }
    }
    return ~lo;  // value not present
}

最后一行代码，找不到这个值的时候，return ~lo实际上到这一步的时候，lo==mid==hi所以这个位置是最适合插入的地方，但是为了能让调用者既知道没有查到值，又知道索引的位置，因此做了一个取反操作，返回一个负数，这样调用的时候可以首先判断是否命中，然后取反来获取索引的位置。

第二个点就是插入数据的具体实现：

public static int[] insert(int[] array, int currentSize, int index, int element) {
    assert currentSize <= array.length;
	//如果当前长度加1还是小于素组长度
    if (currentSize + 1 <= array.length) {
      	//复制数组，没有进行扩容
        System.arraycopy(array, index, array, index + 1, currentSize - index);
        array[index] = element;
        return array;
    }
	//需要扩容，分为两步，首先复制前半部分
    int[] newArray = new int[growSize(currentSize)];
    System.arraycopy(array, 0, newArray, 0, index);
  	//插入数据
    newArray[index] = element;
  	//复制后半部分
    System.arraycopy(array, index, newArray, index + 1, array.length - index);
    return newArray;
}

至此，put就全部分析完了，下面看下remove方法

remove

/**
 * Alias for {@link #delete(int)}.
 */
public void remove(int key) {
    delete(key);
}

/**
 * Removes the mapping from the specified key, if there was any.
 */
public void delete(int key) {
  	//二分查找 找到该key的索引
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
	//如果存在，将该索引上的value赋值为DELETE
    if (i >= 0) {
        if (mValues[i] != DELETED) {
            mValues[i] = DELETED;
          	//标记当前状态为待回收
            mGarbage = true;
        }
    }
}

从中可以看出在remove操作的时候，实际上分为两个步骤：

删除value——在remove中处理
删除key——在gc中处理（此处的gc不是系统 GC，只是SparseArray中的一个方法）

remove()中，将key指向了DELETE，这个时候value失去了引用，它将会在下次系统内存回收的时候被回收。但是可以看到key仍然存在数组中，并没有立马被删除，目的应该是为了保持数据结构，同时不会频繁压缩数组，保证索引查询不会错位，当gc()被调用时，key就会被删除

gc()

private void gc() {
    // Log.e("SparseArray", "gc start with " + mSize);
    int n = mSize;
    int o = 0;
    int[] keys = mKeys;
    Object[] values = mValues;
    for (int i = 0; i < n; i++) {
        Object val = values[i];
		//当这个valye 不为DELETE
        if (val != DELETED) {
            if (i != o) {
              	//将索引i处的key赋值给o处的key
                keys[o] = keys[i];
                values[o] = val;
              	//将i处的值置为空
                values[i] = null;
            }
			//o向后移动一位
            o++;
        }
    }
    mGarbage = false;
    mSize = o;
    // Log.e("SparseArray", "gc end with " + mSize);
}

sparse_array_gc

如上图所示，在3之前，i与o都是相等的，而到3 的时候，因为值为DELETE，所以i++，而o的值仍然为2，到4的时候，发现i != o，则会将4向前移动到3，这个时候o++，但是因为o始终小于i一位，因此后续元素均会向前移动一位。

总结

了解了实现原理之后，主要来对比一下它和HashMap的优缺点：

优势：

避免了基本数据类型的装箱操作
不需要额外的结构体（Entry）单个元素的存储成本更低
数据量小的情况下，随机访问的效率更高

劣势：

插入操作需要复制数组，增删效率降低
数据量巨大的时候，复制数组的成本巨大，gc成本也巨大
数据量巨大的时候，查询效率也会下降

因此，如果key 的类型已经确定为int类型，那么使用SparseArray，因为它避免了自动装箱过程，如果key为long类型，它还提供了一个LongSparseArray来确保key为long类型时的使用

还有一个，数据量不能太大，在最好在千级以内。