Tuesday, January 4, 2011

Sorted Collection Speeds

C# and .Net offer a couple different sorted classes. Arrays can be sorted, sure, but I'm looking at ones that offer some form of insertion sort. For this, I'm looking at these three classes:

Collection Insert(Random) Insert(Sorted) Remove Indexed Fetch
SortedDictionary O(log n) O(log n) O(log n) O(log n)
SortedList O(n) ~O(log n) ? O(n) O(log n)
SortedSet O(log n) O(log n) O(log n) -

Runs

SortedDictionary and SortedList using custom comparer

          Runs    Dictionary    SortedDict    SortedList     SortedSet
Insert
         50000    26,668          460,026      3,823,552     210,012
         60000    33,335          566,699      5,513,648     253,347
         70000    53,336          683,372      7,717,108     326,685
         80000    53,336          863,382     12,737,395     340,019
         90000    50,002          896,718     13,194,088     410,023
        100000    83,338        1,066,728     19,034,422     460,026
        110000    90,005        1,156,733     22,917,977     526,697
        120000    90,005        1,306,741     27,244,891     650,037
Fetch
         50000    13,334        313,351       293,350
         60000    13,334        550,031       363,354
         70000    16,667        506,695       463,359
         80000    23,334        570,032       520,029
         90000    26,668        666,705       610,034
        100000    26,668        753,376       703,373
        110000    30,001        870,049       796,712
        120000    30,002        976,722       896,718

Part of the problem comparing all these collections is that don't compare in the same way. SortedDictionary and SortedList both default to comparing unique keys. If you can guarantee each item in the collection has a unique sorting key, then they work great. If you require more complicated sorting, such as comparing against a non-unique value within the TValue, then you add in the extra overhead of storage and lookup of the base collection. SortedSet works better for comparing TValues.

SortedDictionary and SortedList using default key sorter (insert in order)

          Runs    Dictionary    SortedDict    SortedList     SortedSet
Insert
         50000        30,003       246,691       130,005       160,013
         60000        33,336       356,702       126,679       273,360
         70000        36,670       353,368        73,340       310,031
         80000        56,672       413,374        80,008       416,708
         90000        66,673       466,713        90,009       486,715
        100000        70,007       503,383       103,343       576,724
        110000        80,008       553,388       113,344       666,733
        120000       103,343       646,731       180,018       653,398
Fetch
         50000        10,001       123,345        73,340
         60000        13,334       160,016        90,009
         70000        16,668       186,685       106,677
         80000        16,668       233,356       130,013
         90000        20,002       246,691       140,014
        100000        20,002       266,693       156,682
        110000        23,335       310,031       190,019
        120000        23,335       330,033       190,019

Using the default key comparer in order, SortedList performs optimally.

SortedDictionary and SortedList using default key sorter (insert in reverse order)

          Runs    Dictionary    SortedDict    SortedList     SortedSet
Insert

         50000        30,000       273,333    23,233,332       210,000
         60000        30,000       293,333    28,999,999       276,666
         70000        46,666       350,000    29,679,999       343,333
         80000        50,000       433,333    60,493,330       370,000
         90000        50,000       526,666    49,526,662       500,000
        100000        56,666       560,000    79,496,663       600,000
        110000        56,666       716,666   114,516,661       640,000
        120000        63,333       753,333    98,069,995       720,000
Fetch

         50000        10,000       123,333        73,333
         60000        13,333       150,000        93,333
         70000        13,333       176,666       106,666
         80000        20,000       200,000       123,333
         90000        20,000       243,333       140,000
        100000        23,333       263,333       153,333
        110000        26,666       293,333       173,333
        120000        30,000       323,333       186,666

Do the same run only with reversed keys and SortedList falls apart. SortedDictionary is most consistent with key comparisons (SortedSet still comparing values in these runs).

Notes

These are an average of three runs each. I'm not sure if I'm doing something wrong for the SortedList inserts or not - it's way above everything else. Otherwise for purely random data inserts, SortedSet seems to be the most consistent. If the data is mostly pre-sorted, then SortedList will beat them all, but then bombs with random data. Indexed fetch isn't available for SortedSet so it's not shown. SortedDictionary and List both perform and scale similarly with a custom comparer.

Repository for test code