.NET Core中Dictionary这篇文章里,从算法角度分析了Dictionary的实现。基本的做法可以概括为,通过计算key的hash值,然后散列到对应的bucket中,然后再在bucket对应的链表中存储或查找。第一个关键地方在于hash值的计算。恰巧最近在看《Writing High-Performance .NET Code 》这本书时,里面举了一个不当使用Dictionary的例子,这里借这个例子进一步说明Dictionary中尤其是当key的类型为string时hash值的计算方法,比较有意思,这里记录一下。

引子


在《Writing High-Performance .NET Code 》这本书中,作者在“Using the .NET Framework”这一章举了这么个例子,说是他见到过这么一段代码:有一个Dictionary<string,object>类型,这个类型的key是string类型,且大小写不敏感,如何给定一个字符串,如何在这个字典中查找对应的值。他看到的代码如下:

var keytoLookup = "myKey";
var dict = new Dictionary<string, object>();
...
foreach(var kvp in dict)
{
    if (kvp.Key.ToUpper() == keyToLookup.ToUpper())
    {
        ...
    }
}

这段代码槽点甚多,会造成大量的GC和内存耗费。作者给出了两个解决方法:

  • 限制插入Dictionary中的key的大小写类型,比如强制插入时小写。然后根据key查找的时候,将key转换为小写,然后比较,消除大小写不敏感。
  • 在声明Dictionary的时候,传入一个比较的方法,比如一个静态的StringComparer类。

接下来演示的是第二种解决方法:

var keytoLookup = "myKey";
var dict = new Dictionary<string, object>(StringComparer.OrdinalIgnoreCase);
...
object val;
if (dict.TryGetValue(keyToLookup, out val))
{
    ...
}

很普通但清奇的代码,我们平常在使用Dictionary的时候,很少在其构造函数里面传入参数。那么它在内部是如何实现的呢?细细分析起来还是挺有意思的。

IEqualityComparer是什么?


Dictionary的一个典型的构造函数如下:

public Dictionary(int capacity, IEqualityComparer<TKey>? comparer)
{
    if (capacity < 0)
    {
        ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity);
    }

    if (capacity > 0)
    {
        Initialize(capacity);
    }

    // For reference types, we always want to store a comparer instance, either
    // the one provided, or if one wasn't provided, the default (accessing
    // EqualityComparer<TKey>.Default with shared generics on every dictionary
    // access can add measurable overhead).  For value types, if no comparer is
    // provided, or if the default is provided, we'd prefer to use
    // EqualityComparer<TKey>.Default.Equals on every use, enabling the JIT to
    // devirtualize and possibly inline the operation.
    if (!typeof(TKey).IsValueType)
    {
        _comparer = comparer ?? EqualityComparer<TKey>.Default;

        // Special-case EqualityComparer<string>.Default, StringComparer.Ordinal, and StringComparer.OrdinalIgnoreCase.
        // We use a non-randomized comparer for improved perf, falling back to a randomized comparer if the
        // hash buckets become unbalanced.
        if (typeof(TKey) == typeof(string) &&
            NonRandomizedStringEqualityComparer.GetStringComparer(_comparer!) is IEqualityComparer<string> stringComparer)
        {
            _comparer = (IEqualityComparer<TKey>)stringComparer;
        }
    }
    else if (comparer is not null && // first check for null to avoid forcing default comparer instantiation unnecessarily
    comparer != EqualityComparer<TKey>.Default)
    {
        _comparer = comparer;
    }
}

第一个参数为字典的容量,这个很好理解,Dictionary是一种能够动态扩容的容器,如果能够预估数据量,可以给Dictionary一个初始的容量,以防止不必要的扩容操作。

第二个参数IEqualityComparer<T>是一个泛型接口,从名字上来看,是表示某个相等性比较的接口,这个接口签名如下:

// The generic IEqualityComparer interface implements methods to check if two objects are equal
// and generate Hashcode for an object.
// It is used in Dictionary class.
public interface IEqualityComparer<in T>
{
    bool Equals(T? x, T? y);
    int GetHashCode([DisallowNull] T obj);
}

那在开头我们声明的传入的StringComparer实现了IEqualityComparer<T>吗?

var dict = new Dictionary<string, object>(StringComparer.OrdinalIgnoreCase);

接下来看StringComparer的代码:

[Serializable]
[TypeForwardedFrom("mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")]
public abstract class StringComparer : IComparer, IEqualityComparer, IComparer<string?>, IEqualityComparer<string?>
{
    public static StringComparer InvariantCulture => CultureAwareComparer.InvariantCaseSensitiveInstance;

    public static StringComparer InvariantCultureIgnoreCase => CultureAwareComparer.InvariantIgnoreCaseInstance;

    public static StringComparer CurrentCulture =>
        new CultureAwareComparer(CultureInfo.CurrentCulture, CompareOptions.None);

    public static StringComparer CurrentCultureIgnoreCase =>
        new CultureAwareComparer(CultureInfo.CurrentCulture, CompareOptions.IgnoreCase);

    public static StringComparer Ordinal => OrdinalCaseSensitiveComparer.Instance;

    public static StringComparer OrdinalIgnoreCase => OrdinalIgnoreCaseComparer.Instance;

...............................
}

StringComparer是一个抽象类,它实现了IEqualityComparer<String>接口,里面的OrdinalIgnoreCase只是OrdinalIgnoreCaseComparer的一个实例,OrdinalIgnoreCase实现了OrdinalComparer,OrdinalComparer实现了StringComparer:

[Serializable]
internal sealed class OrdinalIgnoreCaseComparer : OrdinalComparer, ISerializable
{
    internal static readonly OrdinalIgnoreCaseComparer Instance = new OrdinalIgnoreCaseComparer();

    private OrdinalIgnoreCaseComparer() : base(true)
    {
    }

    public override int Compare(string? x, string? y)
    {
        if (ReferenceEquals(x, y))
        {
            return 0;
        }

        if (x == null)
        {
            return -1;
        }

        if (y == null)
        {
            return 1;
        }

        return Globalization.Ordinal.CompareStringIgnoreCase(ref x.GetRawStringData(), x.Length, ref y.GetRawStringData(), y.Length);
    }

    public override bool Equals(string? x, string? y)
    {
        if (ReferenceEquals(x, y))
        {
            return true;
        }

        if (x is null || y is null)
        {
            return false;
        }

        if (x.Length != y.Length)
        {
            return false;
        }

        return Globalization.Ordinal.EqualsIgnoreCase(ref x.GetRawStringData(), ref y.GetRawStringData(), x.Length);
    }

    public override int GetHashCode(string obj)
    {
        if (obj == null)
        {
            ThrowHelper.ThrowArgumentNullException(ExceptionArgument.obj);
        }
        return obj.GetHashCodeOrdinalIgnoreCase();
    }

    public void GetObjectData(SerializationInfo info, StreamingContext context)
    {
        info.SetType(typeof(OrdinalComparer));
        info.AddValue("_ignoreCase", true);
    }
}

OrdinalIgnoreCaseComparer重写了OrdinalComparer中的Equals和GetHashCode方法。

[Serializable]
[TypeForwardedFrom("mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")]
public class OrdinalComparer : StringComparer
{
    private readonly bool _ignoreCase; // Do not rename (binary serialization)

    internal OrdinalComparer(bool ignoreCase)
    {
        _ignoreCase = ignoreCase;
    }

    public override int Compare(string? x, string? y)
    {
        if (ReferenceEquals(x, y))
            return 0;
        if (x == null)
            return -1;
        if (y == null)
            return 1;

        if (_ignoreCase)
        {
            return Globalization.Ordinal.CompareStringIgnoreCase(ref x.GetRawStringData(), x.Length, ref y.GetRawStringData(), y.Length);
        }

        return string.CompareOrdinal(x, y);
    }

    public override bool Equals(string? x, string? y)
    {
        if (ReferenceEquals(x, y))
            return true;
        if (x == null || y == null)
            return false;

        if (_ignoreCase)
        {
            if (x.Length != y.Length)
            {
                return false;
            }
            return Globalization.Ordinal.EqualsIgnoreCase(ref x.GetRawStringData(), ref y.GetRawStringData(), x.Length);
        }
        return x.Equals(y);
    }

    public override int GetHashCode(string obj)
    {
        if (obj == null)
        {
            ThrowHelper.ThrowArgumentNullException(ExceptionArgument.obj);
        }

        if (_ignoreCase)
        {
            return obj.GetHashCodeOrdinalIgnoreCase();
        }

        return obj.GetHashCode();
    }

    // Equals method for the comparer itself.
    public override bool Equals([NotNullWhen(true)] object? obj)
    {
        if (!(obj is OrdinalComparer comparer))
        {
            return false;
        }
        return this._ignoreCase == comparer._ignoreCase;
    }

    public override int GetHashCode()
    {
        int hashCode = nameof(OrdinalComparer).GetHashCode();
        return _ignoreCase ? (~hashCode) : hashCode;
    }

    private protected override bool IsWellKnownOrdinalComparerCore(out bool ignoreCase)
    {
        ignoreCase = _ignoreCase;
        return true;
    }
}

如果不提供IEqualityComparer会怎么样?


从上面的分析可以看出来,Dictionary的IEqualityComparer构造函数参数是用来从key获取哈希值的。如果在声明Dictionary时,不提供这个参数会怎么样呢?

在Dictionary代码中可以看到:

public Dictionary(int capacity, IEqualityComparer<TKey>? comparer)
{
    if (capacity < 0)
    {
        ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity);
    }

    if (capacity > 0)
    {
        Initialize(capacity);
    }

    // For reference types, we always want to store a comparer instance, either
    // the one provided, or if one wasn't provided, the default (accessing
    // EqualityComparer<TKey>.Default with shared generics on every dictionary
    // access can add measurable overhead).  For value types, if no comparer is
    // provided, or if the default is provided, we'd prefer to use
    // EqualityComparer<TKey>.Default.Equals on every use, enabling the JIT to
    // devirtualize and possibly inline the operation.
    if (!typeof(TKey).IsValueType)
    {
        _comparer = comparer ?? EqualityComparer<TKey>.Default;

        // Special-case EqualityComparer<string>.Default, StringComparer.Ordinal, and StringComparer.OrdinalIgnoreCase.
        // We use a non-randomized comparer for improved perf, falling back to a randomized comparer if the
        // hash buckets become unbalanced.
        if (typeof(TKey) == typeof(string) &&
            NonRandomizedStringEqualityComparer.GetStringComparer(_comparer!) is IEqualityComparer<string> stringComparer)
        {
            _comparer = (IEqualityComparer<TKey>)stringComparer;
        }
    }
    else if (comparer is not null && // first check for null to avoid forcing default comparer instantiation unnecessarily
    comparer != EqualityComparer<TKey>.Default)
    {
        _comparer = comparer;
    }
}

首先会判断Key的类型,如果是引用类型,则将该参数存储起来,存储的时候判断给定的comparer是否为空。如果为空则将其赋值为EqualityComparer<TKey>.Default。EqualityComparer<TKey>.Default这又是什么呢?假设我们的key为string,并且没有提供StringComparer参数。

在最新的EqualityComparer代码中Default已经变成运行时定义了。在源代码中不可见。

[Serializable]
[TypeForwardedFrom("mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")]
public abstract partial class EqualityComparer<T> : IEqualityComparer, IEqualityComparer<T>
{
    // public static EqualityComparer<T> Default is runtime-specific
}

但在.NET Framework 4.8的EqualityComparer中,Default的定义很明确:

[Serializable]
[TypeDependencyAttribute("System.Collections.Generic.ObjectEqualityComparer`1")]
public abstract class EqualityComparer<T> : IEqualityComparer, IEqualityComparer<T>
{
    static readonly EqualityComparer<T> defaultComparer = CreateComparer();

    public static EqualityComparer<T> Default
    {
        get
        {
            Contract.Ensures(Contract.Result<EqualityComparer<T>>() != null);
            return defaultComparer;
        }
    }

    //
    // Note that logic in this method is replicated in vm\compile.cpp to ensure that NGen
    // saves the right instantiations
    //
    [System.Security.SecuritySafeCritical] // auto-generated
    private static EqualityComparer<T> CreateComparer()
    {
        Contract.Ensures(Contract.Result<EqualityComparer<T>>() != null);

        RuntimeType t = (RuntimeType) typeof(T);
        // Specialize type byte for performance reasons
        if (t == typeof(byte))
        {
            return (EqualityComparer<T>) (object) (new ByteEqualityComparer());
        }

        // If T implements IEquatable<T> return a GenericEqualityComparer<T>
        if (typeof(IEquatable<T>).IsAssignableFrom(t))
        {
            return (EqualityComparer<T>) RuntimeTypeHandle.CreateInstanceForAnotherGenericParameter(
                (RuntimeType) typeof(GenericEqualityComparer<int>), t);
        }

        // If T is a Nullable<U> where U implements IEquatable<U> return a NullableEqualityComparer<U>
        if (t.IsGenericType && t.GetGenericTypeDefinition() == typeof(Nullable<>))
        {
            RuntimeType u = (RuntimeType) t.GetGenericArguments()[0];
            if (typeof(IEquatable<>).MakeGenericType(u).IsAssignableFrom(u))
            {
                return (EqualityComparer<T>) RuntimeTypeHandle.CreateInstanceForAnotherGenericParameter(
                    (RuntimeType) typeof(NullableEqualityComparer<int>), u);
            }
        }

        // See the METHOD__JIT_HELPERS__UNSAFE_ENUM_CAST and METHOD__JIT_HELPERS__UNSAFE_ENUM_CAST_LONG cases in getILIntrinsicImplementation
        if (t.IsEnum)
        {
            TypeCode underlyingTypeCode = Type.GetTypeCode(Enum.GetUnderlyingType(t));

            // Depending on the enum type, we need to special case the comparers so that we avoid boxing
            // Note: We have different comparers for Short and SByte because for those types we need to make sure we call GetHashCode on the actual underlying type as the 
            // implementation of GetHashCode is more complex than for the other types.
            switch (underlyingTypeCode)
            {
                case TypeCode.Int16: // short
                    return (EqualityComparer<T>) RuntimeTypeHandle.CreateInstanceForAnotherGenericParameter(
                        (RuntimeType) typeof(ShortEnumEqualityComparer<short>), t);
                case TypeCode.SByte:
                    return (EqualityComparer<T>) RuntimeTypeHandle.CreateInstanceForAnotherGenericParameter(
                        (RuntimeType) typeof(SByteEnumEqualityComparer<sbyte>), t);
                case TypeCode.Int32:
                case TypeCode.UInt32:
                case TypeCode.Byte:
                case TypeCode.UInt16: //ushort
                    return (EqualityComparer<T>) RuntimeTypeHandle.CreateInstanceForAnotherGenericParameter(
                        (RuntimeType) typeof(EnumEqualityComparer<int>), t);
                case TypeCode.Int64:
                case TypeCode.UInt64:
                    return (EqualityComparer<T>) RuntimeTypeHandle.CreateInstanceForAnotherGenericParameter(
                        (RuntimeType) typeof(LongEnumEqualityComparer<long>), t);
            }
        }

        // Otherwise return an ObjectEqualityComparer<T>
        return new ObjectEqualityComparer<T>();
    }
}

可以看到EqualityComparer<TKey>.Default其实是由CreateComparer方法生成,代码的意思就是,如果类型TKey实现了IEquatable<T>接口,那么就直接返回该类型的泛型实现。否则,就生成一个ObjectEqualityComparer<T>。

[Serializable]
internal class ObjectEqualityComparer<T> : EqualityComparer<T>
{
    [Pure]
    public override bool Equals(T x, T y)
    {
        if (x != null)
        {
            if (y != null) return x.Equals(y);
            return false;
        }
        if (y != null) return false;
        return true;
    }

    [Pure]
    public override int GetHashCode(T obj)
    {
        if (obj == null) return 0;
        return obj.GetHashCode();
    }

    internal override int IndexOf(T[] array, T value, int startIndex, int count)
    {
        int endIndex = startIndex + count;
        if (value == null)
        {
            for (int i = startIndex; i < endIndex; i++)
            {
                if (array[i] == null) return i;
            }
        }
        else
        {
            for (int i = startIndex; i < endIndex; i++)
            {
                if (array[i] != null && array[i].Equals(value)) return i;
            }
        }
        return -1;
    }

    internal override int LastIndexOf(T[] array, T value, int startIndex, int count)
    {
        int endIndex = startIndex - count + 1;
        if (value == null)
        {
            for (int i = startIndex; i >= endIndex; i--)
            {
                if (array[i] == null) return i;
            }
        }
        else
        {
            for (int i = startIndex; i >= endIndex; i--)
            {
                if (array[i] != null && array[i].Equals(value)) return i;
            }
        }
        return -1;
    }

    // Equals method for the comparer itself. 
    public override bool Equals(Object obj)
    {
        ObjectEqualityComparer<T> comparer = obj as ObjectEqualityComparer<T>;
        return comparer != null;
    }

    public override int GetHashCode()
    {
        return this.GetType().Name.GetHashCode();
    }
}

简单来说,就是Dictionary中,在获取元素哈希值时,会使IEqualityComparer的GetHashCode方法,如果构造函数没有提供该参数,会使EqualityComparer<T>.Default静态属性来获取适合于比较T实例的相等性比较器的某个实现,并调用其GetHashCode(T)虚方法。在第一次获取适当的相等性比较器时,会通过EqualityComparer<T>.CreateComparer进行创建,然后在静态字段中缓存。如果CreateComparer方法发现T实现了IEquatable<T>,就返回GenericEqualityComparer<T>实例。该类的T拥有IEquatable<T>约束,会调用该接口的GetHashCode方法。否则,CreateComparer会返回ObjectEqualityComparer<T>实例,它的T没有任何约束,会调用Object提供的GetHashCode虚方法。

这上面要注意,如果作为key的类型并没有实现EqualityComparer接口或者IEquatable接口,那么代码就会自动生成一个ObjectEqualComparer。如果这个类型恰好是结构类型,那么自动生成ObjectEqualComparer并调用时,就会产生装箱,这是很要命的,所以对于结构类型,一个编程规则就是要为其实现IEquatable接口,这个在后面会举例说明。

那String类型默认实现了EqualityComparer或IEquatable接口吗?接下来看String类型的源代码

[ComVisible(true)]
[Serializable]
public sealed class String : IComparable, ICloneable, IConvertible, IEnumerable
#if GENERICS_WORK
        , IComparable<String>, IEnumerable<char>, IEquatable<String>
#endif
{
    // Determines whether two strings match.
    [Pure]
    [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
    public bool Equals(String value)
    {
        if (this == null) //this is necessary to guard against reverse-pinvokes and
            throw new NullReferenceException(); //other callers who do not use the callvirt instruction

        if (value == null)
            return false;

        if (Object.ReferenceEquals(this, value))
            return true;

        if (this.Length != value.Length)
            return false;

        return EqualsHelper(this, value);
    }

    [System.Security.SecuritySafeCritical] // auto-generated
    [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
    private unsafe static bool EqualsHelper(String strA, String strB)
    {
        Contract.Requires(strA != null);
        Contract.Requires(strB != null);
        Contract.Requires(strA.Length == strB.Length);

        int length = strA.Length;

        fixed (char* ap = &strA.m_firstChar)
        fixed (char* bp = &strB.m_firstChar)
        {
            char* a = ap;
            char* b = bp;

            // unroll the loop
#if AMD64
                // for AMD64 bit platform we unroll by 12 and
                // check 3 qword at a time. This is less code
                // than the 32 bit case and is shorter
                // pathlength
 
                while (length >= 12)
                {
                    if (*(long*)a     != *(long*)b) return false;
                    if (*(long*)(a+4) != *(long*)(b+4)) return false;
                    if (*(long*)(a+8) != *(long*)(b+8)) return false;
                    a += 12; b += 12; length -= 12;
                }
#else
            while (length >= 10)
            {
                if (*(int*)a != *(int*)b) return false;
                if (*(int*)(a + 2) != *(int*)(b + 2)) return false;
                if (*(int*)(a + 4) != *(int*)(b + 4)) return false;
                if (*(int*)(a + 6) != *(int*)(b + 6)) return false;
                if (*(int*)(a + 8) != *(int*)(b + 8)) return false;
                a += 10;
                b += 10;
                length -= 10;
            }
#endif

            // This depends on the fact that the String objects are
            // always zero terminated and that the terminating zero is not included
            // in the length. For odd string sizes, the last compare will include
            // the zero terminator.
            while (length > 0)
            {
                if (*(int*)a != *(int*)b) break;
                a += 2;
                b += 2;
                length -= 2;
            }

            return (length <= 0);
        }
    }
}

默认的String类型的确实现了IEquatable<string>接口。

所以如果以string作为key,如果不指定StringComparer,则会调用String默认实现的IEquatable接口,否则,就用指定的StringComparer。接下来会有一个转换:

 if (typeof(TKey) == typeof(string) &&
            NonRandomizedStringEqualityComparer.GetStringComparer(_comparer!) is IEqualityComparer<string> stringComparer)
{
      _comparer = (IEqualityComparer<TKey>)stringComparer;
}

NonRandomizedStringEqualityComparer 的源代码如下:

// NonRandomizedStringEqualityComparer is the comparer used by default with the Dictionary<string,...>
// We use NonRandomizedStringEqualityComparer as default comparer as it doesnt use the randomized string hashing which
// keeps the performance not affected till we hit collision threshold and then we switch to the comparer which is using
// randomized string hashing.
[Serializable] // Required for compatibility with .NET Core 2.0 as we exposed the NonRandomizedStringEqualityComparer inside the serialization blob
               // Needs to be public to support binary serialization compatibility
public class NonRandomizedStringEqualityComparer : IEqualityComparer<string?>, IInternalStringEqualityComparer, ISerializable
{
    // Dictionary<...>.Comparer and similar methods need to return the original IEqualityComparer
    // that was passed in to the ctor. The caller chooses one of these singletons so that the
    // GetUnderlyingEqualityComparer method can return the correct value.

    private static readonly NonRandomizedStringEqualityComparer WrappedAroundDefaultComparer = new OrdinalComparer(EqualityComparer<string?>.Default);
    private static readonly NonRandomizedStringEqualityComparer WrappedAroundStringComparerOrdinal = new OrdinalComparer(StringComparer.Ordinal);
    private static readonly NonRandomizedStringEqualityComparer WrappedAroundStringComparerOrdinalIgnoreCase = new OrdinalIgnoreCaseComparer(StringComparer.OrdinalIgnoreCase);

    private readonly IEqualityComparer<string?> _underlyingComparer;

    private NonRandomizedStringEqualityComparer(IEqualityComparer<string?> underlyingComparer)
    {
        Debug.Assert(underlyingComparer != null);
        _underlyingComparer = underlyingComparer;
    }

    // This is used by the serialization engine.
    [Obsolete(Obsoletions.LegacyFormatterImplMessage, DiagnosticId = Obsoletions.LegacyFormatterImplDiagId, UrlFormat = Obsoletions.SharedUrlFormat)]
    [EditorBrowsable(EditorBrowsableState.Never)]
    protected NonRandomizedStringEqualityComparer(SerializationInfo information, StreamingContext context)
        : this(EqualityComparer<string?>.Default)
    {
    }

    public virtual bool Equals(string? x, string? y)
    {
        // This instance may have been deserialized into a class that doesn't guarantee
        // these parameters are non-null. Can't short-circuit the null checks.

        return string.Equals(x, y);
    }

    public virtual int GetHashCode(string? obj)
    {
        // This instance may have been deserialized into a class that doesn't guarantee
        // these parameters are non-null. Can't short-circuit the null checks.

        return obj?.GetNonRandomizedHashCode() ?? 0;
    }

    internal virtual RandomizedStringEqualityComparer GetRandomizedEqualityComparer()
    {
        return RandomizedStringEqualityComparer.Create(_underlyingComparer, ignoreCase: false);
    }

    // Gets the comparer that should be returned back to the caller when querying the
    // ICollection.Comparer property. Also used for serialization purposes.
    public virtual IEqualityComparer<string?> GetUnderlyingEqualityComparer() => _underlyingComparer;

    void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
    {
        // We are doing this to stay compatible with .NET Framework.
        // Our own collection types will never call this (since this type is a wrapper),
        // but perhaps third-party collection types could try serializing an instance
        // of this.
        info.SetType(typeof(GenericEqualityComparer<string>));
    }

    private sealed class OrdinalComparer : NonRandomizedStringEqualityComparer
    {
        internal OrdinalComparer(IEqualityComparer<string?> wrappedComparer)
            : base(wrappedComparer)
        {
        }

        public override bool Equals(string? x, string? y) => string.Equals(x, y);

        public override int GetHashCode(string? obj)
        {
            Debug.Assert(obj != null, "This implementation is only called from first-party collection types that guarantee non-null parameters.");
            return obj.GetNonRandomizedHashCode();
        }

    }

    private sealed class OrdinalIgnoreCaseComparer : NonRandomizedStringEqualityComparer
    {
        internal OrdinalIgnoreCaseComparer(IEqualityComparer<string?> wrappedComparer)
            : base(wrappedComparer)
        {
        }

        public override bool Equals(string? x, string? y) => string.EqualsOrdinalIgnoreCase(x, y);

        public override int GetHashCode(string? obj)
        {
            Debug.Assert(obj != null, "This implementation is only called from first-party collection types that guarantee non-null parameters.");
            return obj.GetNonRandomizedHashCodeOrdinalIgnoreCase();
        }

        internal override RandomizedStringEqualityComparer GetRandomizedEqualityComparer()
        {
            return RandomizedStringEqualityComparer.Create(_underlyingComparer, ignoreCase: true);
        }
    }

    public static IEqualityComparer<string>? GetStringComparer(object comparer)
    {
        // Special-case EqualityComparer<string>.Default, StringComparer.Ordinal, and StringComparer.OrdinalIgnoreCase.
        // We use a non-randomized comparer for improved perf, falling back to a randomized comparer if the
        // hash buckets become unbalanced.

        if (ReferenceEquals(comparer, EqualityComparer<string>.Default))
        {
            return WrappedAroundDefaultComparer;
        }

        if (ReferenceEquals(comparer, StringComparer.Ordinal))
        {
            return WrappedAroundStringComparerOrdinal;
        }

        if (ReferenceEquals(comparer, StringComparer.OrdinalIgnoreCase))
        {
            return WrappedAroundStringComparerOrdinalIgnoreCase;
        }

        return null;
    }
}

具体代码在GetStringComparer方法里。

这个类的注释也很有意思,从名字看它是非随机性哈希,可以保证性能。当哈希冲突比较严重时,比如碰到哈希攻击,则会启用随机性哈希RandomizedStringEqualityComparer

了解了原理之后,下面来看两个实际的例子

一个Dictionary中key为string且大小写不敏感的例子


看到上面的分析后,在代码中刚好碰到了一个用到key大小写不敏感的例子。为什么要改它因为它存储的时候,key都是string类型小写存储的,然后将需要查找的key转为小写再去查找:

Dictionary<string, double> chineseFutureMiniPriceChangeDic = new Dictionary<string, double>(StringComparer.OrdinalIgnoreCase);
chineseFutureMiniPriceChangeDic = new Dictionary<string, double>
{
	["ic"] = 0.2,
	["if"] = 0.1,
	["ih"] = 0.2,
};
string toLookUpStr = "IC";
double p;
if (chineseFutureMiniPriceChangeDic.TryGetValue(toLookUpStr.ToLower(), out p))
{
	//to do
}

这个代码中的“bad smell”在于ToLower()方法。要移除这个方法就用到上面所说的在定义Dictionary的时候指定StringComparer.OrdinalIgnoreCase。

Dictionary<string, double> chineseFutureMiniPriceChangeDic = new Dictionary<string, double>(StringComparer.OrdinalIgnoreCase);
chineseFutureMiniPriceChangeDic = new Dictionary<string, double>
{
	["ic"] = 0.2,
	["if"] = 0.2,
	["ih"] = 0.2,
};

string toLookUpStr = "IC";
double p;
if (chineseFutureMiniPriceChangeDic.TryGetValue(toLookUpStr, out p))
{
	//to do
}

上面看起来正确,但运行会报错,原因在于初始化的时候使用的是集合初始化。他相当于直接初始化了一个新的Dictionary对象,然后将该对象赋值给定义的chineseFutureMiniPriceChangeDic,这么做就绕过了chineseFutureMiniPriceChangeDic 里面对key指定的StringComparer.OrdinalIgnoreCase比较方法,换句话说就是在获取哈希值的时候,没有通过StringComparer.OrdinalIgnoreCase来获取。

解决方法是,在new的时候加上构造函数参数:

Dictionary<string, double> chineseFutureMiniPriceChangeDic = new Dictionary<string, double>(StringComparer.OrdinalIgnoreCase);
chineseFutureMiniPriceChangeDic = new Dictionary<string, double>(StringComparer.OrdinalIgnoreCase)
{
	["ic"] = 0.2,
	["if"] = 0.2,
	["ih"] = 0.2,
};

string toLookUpStr = "IC";
double p;
if (chineseFutureMiniPriceChangeDic.TryGetValue(toLookUpStr, out p))
{
	//to do
}

或者,声明变量后,使用Add方法添加成员:

Dictionary<string, double> chineseFutureMiniPriceChangeDic = new Dictionary<string, double>(StringComparer.OrdinalIgnoreCase);
chineseFutureMiniPriceChangeDic.Add("ic", 0.2);
chineseFutureMiniPriceChangeDic.Add("if", 0.2);
chineseFutureMiniPriceChangeDic.Add("ih", 0.2);

string toLookUpStr = "IC";
double p;
if (chineseFutureMiniPriceChangeDic.TryGetValue(toLookUpStr, out p))
{
	//to do
}

结构体作为Dictionary的key的正确做法


假设需要落地行情数据,将每只股票每天的行情数据存储到一个文件中,这就需要将落地文件的名称缓存起来,避免频繁拼接和调用。则是Dictionary的key是两个字段symbol和data,value就是包含全路径的落地文件。

struct MultiKey : IEquatable<MultiKey>
{
    private readonly string symbol;
    private readonly string date;

    public MultiKey(string s, string de)
    {
        symbol = s;
        date = de;
    }

    public bool Equals(MultiKey other)
    {
        return other.symbol == symbol && other.date == date;
    }

    public override bool Equals(object obj) => obj is MultiKey other && Equals(other);

    public override int GetHashCode()
    {
        //https://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-overriding-gethashcode
        unchecked // Overflow is fine, just wrap
        {
            int hash = 17;
            // Suitable nullity checks etc, of course :)
            hash = hash * 23 + symbol.GetHashCode();
            hash = hash * 23 + date.GetHashCode();
            return hash;
        }
    }
}

这里定义为struct可以节省内存和提升效率。struct作为key为了避免装箱需要实现IEquatable接口,原因分析如上。

其它


StringComparison.Ordinal和StringComparison.OrdinalIgnoreCase是忽略语言文化字符串比较,它们速度最快。如果要考虑语言文化的比较,比如需要在UI界面上显示,则需要使用StringComparison.CurrentCulture和StringComparison.CurrentCultureIgnoreCase。

要更改字符串的大小写,应该使用String.ToUpperInvariant和String.ToLowerInvariant。强烈建议使用ToUpperInvariant 而不要使用 ToLowerInvariant,因为微软对执行大写比较的代码进行了优化。在执行不区分大小写比较之前,FCL会自动将字符串转换为大写形式。之所以用ToUpperInvariant和ToLowerInvariant,是因为String没有提供ToUpperOrdinal和ToLowerOrdinal方法。而不用ToUpper和ToLower是因为它们对语言文化敏感。

总结


本文由一个在Dictionary中使用String作为key,且支持大小写不敏感匹配的例子来说明Dictionary的内部原理。由Key获取hash值是Dictionary实现的关键。在Dictionary声明的时候,可以通过构造函数传递IEqualityComparer参数来指定比较器,通过该对象来获取哈希值。如果不提供这个参数,则会在内部判断Key是否实现了IEquatable泛型接口,如果提供了,则使用该泛型接口中定义的GetHashCode来获取哈希值,否则会根据Key的类型生成一个通用的EqualityComparer,这个通过的EqualityComparer的参数是object类型,所以如果Key的类型为结构体,则必须为其实现IEquatable泛型接口,否则就会产生装箱从而严重影响性能。

通过对Dictionary的内部原理分析可以看到,当使用string类型作为key的时候,如果不指定StringComparer则会默认使用String实现的IEquatable接口里的实现。这跟如果需要实现大小写不敏感时,明确指定StringComparer.OrdinalIgnoreCase相比,性能不会更差,但是他能够消除在调用TryGetValue时,对key进行ToLower()转换,而这种字符串的大小写转换是比较耗费CPU且会对GC产生压力。

 

参考