Pagina's

2013/12/30

MWC58 Fast RNG small period part 4

/*
"MWC" stands for "Multiply With Carry", "58" stands for the period: > 2^58 
Actually the period is larger than 2^60, the first version had a smaller period.
A big drawback: The usable period is a lot smaller. Despite this drawback,
MWC58 might be usefull:
-128 non correlated sequences of  random numbers,
 each sequence contains at least 590 million numbers,
 sequences are repeatable, initialisation is fast.
-seed another RNG

It's  based on: Marsaglia's message(1998)
    
MWC58:
  
    private static uint mwc58()
    {
        z0 = m0 * (z0 & 65535) + (z0 >> 16);   //  both (m0<<15)-1 and (m0<<16)-1 are prime
        z1 = m1 * (z1 & 65535) + (z1 >> 16);   //  both (m1<<15)-1 and (m1<<16)-1 are prime
        return z0 + (z1 << 16);
    }

Why is this fast? From a publication by Mark A. Overton(2011):
 
    Fast, High-Quality, Parallel Random Number Generators
    
Two independent generators are used: z0 = m0 * ... , z1 = m1 * ...   
The compiler places instructions in an order that causes the processor to 
pipeline the computations of the generators. As a result of this parallelizing, 
the generators execute even faster than expected. The second multiply, z1 = m1 * ...
only adds one more clock cycle.  

The period "p0" of the first generator is: p0 = m0 * 2^15 ,
with a seed value "s0": 0 < s0 <= 2 * p0 - 2 .
The period of MWC58: m0 * m1 * 2^30 .
The usable period is a lot smaller. The 16 lsb's (least significant bits) are
the 16 lsb's of the first generator: z0 , with period: p0 = m0 * 2^15.
The most significant bits depend much on the second generator: z1 ,
where the highest bits have the largest dependencies.
For the smallest value of m0: 18030, MWC58 generates 18030 * 2^15 (~5.9 * 10^8)
random numbers in ~2.25 seconds. In other words: When 5.9 * 10^8 random numbers
are needed, MWC58 generates them fast.

With 18030 <= m <= 65184 , there are 256 valid multipliers:
    ushort[] m = { 18030 ... 65184 };
With fixed multipliers it is possible to use a 32 bits seed value,
from which valid values for s0 and s1 can be dereived.
When we want different sequences, using different seed values,
to be more divergent (less correlated), it is better to use the seed for
indices in the "ushort[] m" array.

Off topic: The k-th (ordered) combination of { 0, 1, 2, 3, ... , n } , or
the k-th 2-element subset of the set { 0, 1, 2, 3, ... , n }
Triangular numbers and their inverse are involved.


 
    private static uint[] tri_com(uint k, uint n) // n > 0                //     n = 3   
    {                                                                     //   k | i | j 
        uint t, i, j;                                                     //  ---|---|---
        t = n * (n + 1) / 2;                                              //   0 | 0 | 1 
        k %= t;                                                           //   1 | 0 | 2 
        i = n - 1 - (uint)((Math.Sqrt(8 * (t - k - 1) + 1) - 1) / 2);     //   2 | 0 | 3 
        j = (i * (i + 1) / 2 + k) % n + 1;                                //   3 | 1 | 2 
        uint[] ij = { i, j };                                             //   4 | 1 | 3 
        return ij;                                                        //   5 | 2 | 3 
    }                                                                     //   6 | 0 | 1 
                                                                       
When combined indices are used like: 0,3 1,2 (2 combinations)
random sequences generated by those combinations are less (not?) correlated.
With 256 multipliers there are 128 combinations.
 
The scaled version:

"mwc_uint(u)" returns a value x ( 0 <= x <= u ).
Before I used a trick to save random bits, which is not necessary any longer.
An example: mwc_uint(1) returns 0 or 1.
Under the hood 32 random bits are/were generated, they were put in a buffer,
from which the bits were taken out, one by one.
That required some code: buffer empty? refill, etc.
Some code takes some time.
Now one bit is returned, 31 remaining bits are not used,
it is faster to generate 32 new bits, and to use just one of them.
The rather small period is a bottleneck, a solution: CMR63

Results:
    
    10^7 times rand.Next() takes: 107 ms (~11 ns per random number)
    10^7 times rand.Next(int someValue) takes: 236 ms (~24 ns ...)
    
    |--------|--------------------------------------------------------|      
    |        |                        ms 10^7 *                       |      rand.Next()    11 ns      
    |        |-------------|--------------|-------------|-------------|          mwc58()     4 ns
    |      u | mwc_uint(u) | well_uint(u) | gen_uint(u) | g_uint_c(u) |     rand.Next(i)    24 ns                 
    |--------|------|------|-------|------|-------|-----|------|------|      mwc_uint(u)    18 ns, mean  ~12 ns
    |      0 |   20 |      |    24 |      |    24 |     |   24 |      |     well_uint(u)    44 ns, mean  ~32 ns  
    |      1 |      |  51  |       |   74 |       |  73 |      |  139 |      gen_uint(u)    50 ns, mean  ~35 ns   
    |      2 |   96 |      |   121 |      |   124 |     |  296 |      |      g_uint_c(u)   280 ns, mean ~250 ns
    |      3 |      |  51  |       |   80 |       |  78 |      |  211 |       
    |      4 |  133 |      |   222 |      |   242 |     |  518 |      |                 
    |      7 |      |  51  |       |   82 |       |  83 |      |  282 |               
    |      8 |  153 |      |   272 |      |   290 |     |  680 |      |     
    |  2^7-1 |      |  51  |       |   98 |       | 102 |      |  562 |     
    |  2^7-0 |  172 |      |   344 |      |   380 |     | 1079 |      |           
    | 2^15-1 |      |  51  |       |  131 |       | 130 |      | 1123 |               
    | 2^15-0 |  174 |      |   379 |      |   414 |     | 1634 |      |               
    | 2^30-1 |      |  51  |       |  193 |       | 193 |      | 2164 |  
    | 2^30-0 |  175 |      |   431 |      |   494 |     | 2672 |      |  
    | 2^31-1 |      |  51  |       |  195 |       | 210 |      | 2251 |  
    | 2^31-0 |  175 |      |   427 |      |   461 |     | 2747 |      |  
    | 2^32-1 |      |  51  |       |  196 |       | 211 |      | 2307 |  
    |--------|------|------|-------|------|-------|-----|------|------| 

http://www.cacert.at/cgi-bin/rngresults  (code: find and hover mouse over "MWC58")
 
|---------|---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
|  Gene-  |  Speed  | Upload | Entropy  | Birthday | Matrix | 6*8 Ma- | Minimum  | Random   | Squeeze  | Overlap- |
|  rator  | (MiB/s) |  Size  | (->8)    | Spacing  | Ranks  | trix R  | Distance | Spheres  |          | ping Sums|
|-------- |---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
|   mwc58 |  1000   | 19 MiB | 7.999990 | 0.161056 |  0.504 |  0.920  | 0.877503 | 0.432728 | 0.677887 | 0.458630 |
| well512 |   400   | 19 MiB | 7.999992 | 0.446363 |  0.636 |  0.879  | 0.967495 | 0.690789 | 0.942252 | 0.005987 |
|  Random |   370   | 17 MiB | 7.999989 | 0.000000 |  0.392 |  0.136  | 0.063920 | 0.905097 | 0.098440 | 0.005467 |
|     CSP |    34   | 15 MiB | 7.999987 | 0.032102 |  0.084 |  0.942  | 0.894277 | 0.036221 | 0.706538 | 0.023523 |
|-------- |---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|

*/

using System;
class mwc
{
    private static uint m0 = 18030, z0 = 325080900, m1 = 65184, z1 = 4248953856;

    private static uint mwc58()
    {
        z0 = m0 * (z0 & 65535) + (z0 >> 16);
        z1 = m1 * (z1 & 65535) + (z1 >> 16);
        return z0 + (z1 << 16);
    }

    private static uint g_u = 0;
    private static int g_s = 0;

    private static uint mwc_uint(uint u)
    {
        uint x;
        if (u == 0) return 0;
        if (u != g_u)
        {
            g_u = u;
            g_s = 32 - nob(u);
        }
        do
        {
            z0 = m0 * (z0 & 65535) + (z0 >> 16);
            z1 = m1 * (z1 & 65535) + (z1 >> 16);
            x = (z0 + (z1 << 16)) >> g_s;
        }
        while (x > u);
        return x;
    }

    private static void init(uint seed)
    {
        ushort[] m = {
            18030, 18273, 18513, 18879, 19074, 19098, 19164, 19215, 19584, 19599,
            19950, 20088, 20508, 20544, 20664, 20814, 20970, 21153, 21243, 21423,
            21723, 21954, 22125, 22188, 22293, 22860, 22938, 22965, 22974, 23109,
            23124, 23163, 23208, 23508, 23520, 23553, 23658, 23865, 24114, 24219,
            24660, 24699, 24864, 24948, 25023, 25308, 25443, 26004, 26088, 26154,
            26550, 26679, 26838, 27183, 27258, 27753, 27795, 27810, 27834, 27960,
            28320, 28380, 28689, 28710, 28794, 28854, 28959, 28980, 29013, 29379,
            29889, 30135, 30345, 30459, 30714, 30903, 30963, 31059, 31083, 31215,
            31353, 31488, 31743, 32430, 32718, 33105, 33189, 33249, 33375, 33378,
            33663, 33768, 33858, 33894, 34158, 34323, 34383, 34590, 34653, 34890,
            35355, 35523, 35643, 36309, 36594, 36804, 36969, 37698, 37935, 37959,
            38079, 38223, 38283, 38484, 38568, 38610, 38649, 38733, 38850, 39444,
            39618, 39690, 39948, 40833, 40995, 41019, 41064, 41289, 41628, 41793,
            41874, 42153, 42444, 42513, 42594, 42633, 42699, 42819, 42903, 42975,
            43038, 43155, 43473, 43563, 43995, 44019, 44568, 44574, 44994, 45723,
            45729, 45780, 45789, 45915, 45939, 46515, 47088, 47529, 48015, 48033,
            48195, 48204, 48393, 49209, 49248, 49299, 49458, 50034, 50223, 50580,
            50589, 50694, 50853, 50988, 51198, 51558, 51618, 51729, 51744, 51813,
            51873, 51933, 52023, 52215, 52275, 52509, 52743, 52950, 53130, 53199,
            53529, 53709, 53898, 53934, 53958, 54144, 54168, 54399, 54474, 54564,
            54885, 55044, 55074, 55179, 55254, 55680, 55809, 55848, 55869, 56205,
            56538, 56604, 56790, 56859, 57039, 57204, 57225, 57525, 57603, 57774,
            57780, 57918, 58149, 58368, 58443, 58758, 59253, 59325, 59775, 60009,
            60060, 60489, 60735, 60990, 61140, 61578, 61914, 62505, 62634, 62778,
            62790, 62865, 62874, 62904, 63129, 63273, 63444, 63663, 63765, 63885,
            64185, 64314, 64455, 64545, 64860, 65184 };

        seed &= 127; m0 = m[seed]; z0 = m0 * m0;
        seed ^= 255; m1 = m[seed]; z1 = m1 * m1;
    }

    private static int nob(uint u) // number of bits, position of the highest set bit, u > 0  
    {
        return
        u < 1u << 16 ? u < 1u << 08 ? u < 1u << 04 ? u < 1u << 02 ? u < 1u << 01 ? 01 : 02 :
                                                                    u < 1u << 03 ? 03 : 04 :
                                                     u < 1u << 06 ? u < 1u << 05 ? 05 : 06 :
                                                                    u < 1u << 07 ? 07 : 08 :
                                      u < 1u << 12 ? u < 1u << 10 ? u < 1u << 09 ? 09 : 10 :
                                                                    u < 1u << 11 ? 11 : 12 :
                                                     u < 1u << 14 ? u < 1u << 13 ? 13 : 14 :
                                                                    u < 1u << 15 ? 15 : 16 :
                       u < 1u << 24 ? u < 1u << 20 ? u < 1u << 18 ? u < 1u << 17 ? 17 : 18 :
                                                                    u < 1u << 19 ? 19 : 20 :
                                                     u < 1u << 22 ? u < 1u << 21 ? 21 : 22 :
                                                                    u < 1u << 23 ? 23 : 24 :
                                      u < 1u << 28 ? u < 1u << 26 ? u < 1u << 25 ? 25 : 26 :
                                                                    u < 1u << 27 ? 27 : 28 :
                                                     u < 1u << 30 ? u < 1u << 29 ? 29 : 30 :
                                                                    u < 1u << 31 ? 31 : 32;
    }

    static void Main()
    {
        // test ( part 1 ? )
    }    
}

2013/12/09

WELL512 scaled random numbers part 3


/*
"WELL" stands for "Well Equidistributed Long-period Linear",
"512" stands for the period: 2^512
Chris Lomont's version is used, visit:

    http://www.lomont.org/Math/Papers/2008/Lomont_PRNG_2008.pdf
  
and read about the advantages of the "WELL512" PRNG.
A "state" array of 16 uints is used, it's seeded by another PRNG: "System.Random".
"System.Random" uses a seed, the absolute value of an int.
Sequences of random numbers generated by System.Random, using different seed values,
are correlated, the left side of the image above clearly shows patterns.
The right side looks pretty random, sequences appear to be less correlated.
In other words: It is possible to regenerate a high quality random sequence,
using the same seed value. Different sequences, using different seed value's, 
seem to be less correlated.


    Results:
    
    10^7 times rand.Next() takes: 107 ms (~11 ns per random number)
    10^7 times rand.Next(int someValue) takes: 236 ms (~24 ns ...)
    
    |--------|------------------------------------------|       rand.Next()    11 ns      
    |        |                 ms 10^7 *                |      rand.Next(i)    24 ns              
    |        |--------------|-------------|-------------|      well_uint(u)    44 ns, mean  ~32 ns
    |      u | well_uint(u) | gen_uint(u) | g_uint_c(u) |       gen_uint(u)    50 ns, mean  ~35 ns    
    |--------|-------|------|-------|-----|------|------|       g_uint_c(u)   280 ns, mean ~250 ns
    |      0 |    24 |      |    24 |     |   24 |      |         
    |      1 |       |   74 |       |  73 |      |  139 |          
    |      2 |   121 |      |   124 |     |  296 |      |            
    |      3 |       |   80 |       |  78 |      |  211 |       
    |      4 |   222 |      |   242 |     |  518 |      |                 
    |      7 |       |   82 |       |  83 |      |  282 |               
    |      8 |   272 |      |   290 |     |  680 |      |     
    |  2^7-1 |       |   98 |       | 102 |      |  562 |     
    |  2^7-0 |   344 |      |   380 |     | 1079 |      |           
    | 2^15-1 |       |  131 |       | 130 |      | 1123 |               
    | 2^15-0 |   379 |      |   414 |     | 1634 |      |               
    | 2^30-1 |       |  193 |       | 193 |      | 2164 |  
    | 2^30-0 |   431 |      |   494 |     | 2672 |      |  
    | 2^31-1 |       |  195 |       | 210 |      | 2251 |  
    | 2^31-0 |   427 |      |   461 |     | 2747 |      |  
    | 2^32-1 |       |  196 |       | 211 |      | 2307 |  
    |--------|-------|------|-------|-----|------|------|


http://www.cacert.at/cgi-bin/rngresults  find and hover mouse over "well512" for the code.

|---------|---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
|  Gene-  |  Speed  | Upload | Entropy  | Birthday | Matrix | 6*8 Ma- | Minimum  | Random   | Squeeze  | Overlap- |
|  rator  | (MiB/s) |  Size  | (->8)    | Spacing  | Ranks  | trix R  | Distance | Spheres  |          | ping Sums|
|-------- |---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
| WELL512 |   400   | 19 MiB | 7.999992 | 0.446363 |  0.636 |  0.879  | 0.967495 | 0.690789 | 0.942252 | 0.005987 |
|  Random |   370   | 17 MiB | 7.999989 | 0.000000 |  0.392 |  0.136  | 0.063920 | 0.905097 | 0.098440 | 0.005467 |
|     CSP |    34   | 15 MiB | 7.999987 | 0.032102 |  0.084 |  0.942  | 0.894277 | 0.036221 | 0.706538 | 0.023523 |
|-------- |---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|  
  
*/

using System;
using System.Drawing; // add a ref

class scaled_well512
{
    private static uint[] state = new uint[16];
    private static int index = -1;

    private static void init_well(int seed)
    {
        seed = Math.Abs(seed);
        index = seed & 15;
        var rand = new Random(seed);
        uint u = (uint)rand.Next(1 << 30);
        for (int i = 0; i < 15; i++)
        {
            state[i] = (uint)rand.Next(1 << 30) << 2 | u & 3u;
            state[i] ^= (uint)rand.Next(1 << 30);
            state[i] ^= (uint)rand.Next(1 << 30);
            u >>= 2;
        }
        state[15] = (uint)rand.Next(1 << 30) << 2 | (uint)rand.Next(4);
        state[15] ^= (uint)rand.Next(1 << 30);
        state[15] ^= (uint)rand.Next(1 << 30);
    }

    private static uint well512()
    {
        uint a, b, c, d;
        a = state[index];
        c = state[index + 13 & 15];
        b = a ^ c ^ (a << 16) ^ (c << 15);
        c = state[index + 9 & 15];
        c ^= c >> 11;
        a = state[index] = b ^ c;
        d = a ^ ((a << 5) & 0xda442d24u);
        index = index + 15 & 15;
        a = state[index];
        state[index] = a ^ b ^ d ^ (a << 2) ^ (b << 18) ^ (c << 28);
        return state[index];
    }

    private static ulong buf = 0;              // 64 random bits buffer
    private static int av = 0;                 // available bits in buf
    private static uint g_u = 0;               // copy of u
    private static int g_nd = 0;               // nd (needed) bits = highest set bit of u
    private static int g_s = 0;                // shift
    private static bool ones = false;          // true if u = 10, 110, 1110, ...

    private static uint well_uint(uint u)
    {
        uint x; int s;
        if (u == 0) return 0;
        if (u != g_u)
        {
            g_u = u;
            g_nd = nob(u);
            g_s = 64 - g_nd;
            ones = (u + 1 & u + 2) == 0;
        }
    L0: if (av < g_nd)
        {
            av += 32;
            buf <<= 32;
            buf |= well512();
        }
        x = (uint)buf << g_s >> g_s;
        buf >>= g_nd;
        av -= g_nd;
        if (x <= u) return x;
        if (ones) goto L0;                     // faster when u = 2, 6, 14, ...
        s = nob(x ^ u) - 1;
        av += s;
        buf <<= s;
        buf ^= x;
        goto L0;
    }

    private static int nob(uint u) // number of bits, position of the highest set bit, u > 0  
    {
        return
        u < 1u << 16 ? u < 1u << 08 ? u < 1u << 04 ? u < 1u << 02 ? u < 1u << 01 ? 01 : 02 :
                                                                    u < 1u << 03 ? 03 : 04 :
                                                     u < 1u << 06 ? u < 1u << 05 ? 05 : 06 :
                                                                    u < 1u << 07 ? 07 : 08 :
                                      u < 1u << 12 ? u < 1u << 10 ? u < 1u << 09 ? 09 : 10 :
                                                                    u < 1u << 11 ? 11 : 12 :
                                                     u < 1u << 14 ? u < 1u << 13 ? 13 : 14 :
                                                                    u < 1u << 15 ? 15 : 16 :
                       u < 1u << 24 ? u < 1u << 20 ? u < 1u << 18 ? u < 1u << 17 ? 17 : 18 :
                                                                    u < 1u << 19 ? 19 : 20 :
                                                     u < 1u << 22 ? u < 1u << 21 ? 21 : 22 :
                                                                    u < 1u << 23 ? 23 : 24 :
                                      u < 1u << 28 ? u < 1u << 26 ? u < 1u << 25 ? 25 : 26 :
                                                                    u < 1u << 27 ? 27 : 28 :
                                                     u < 1u << 30 ? u < 1u << 29 ? 29 : 30 :
                                                                    u < 1u << 31 ? 31 : 32;
    }

    static void Main()
    {
        random_bitmap();
        well_bitmap();
    }

    private static void random_bitmap()
    {
        int xmax = 450;
        int ymax = 450;
        var bmp = new Bitmap(xmax + 1, ymax + 1);
        int x, y, i = 0; uint z;
        for (x = 0; x <= xmax; x++)
        {
            for (y = 0; y < ymax; y++)
            {
                var rand = new Random(i++);
                z = (uint)rand.Next(1 << 30);
                if ((z & 1) == 0) bmp.SetPixel(x, y, Color.White);
                else bmp.SetPixel(x, y, Color.Black);
            }
        }
        bmp.Save(xmax.ToString() + "R.bmp");
    }

    private static void well_bitmap()
    {
        int xmax = 450;
        int ymax = 450;
        var bmp = new Bitmap(xmax + 1, ymax + 1);
        int x, y, i = 0; uint z;
        for (x = 0; x <= xmax; x++)
        {
            for (y = 0; y < ymax; y++)
            {
                init_well(i);
                i += 1;
                //z = well512();
                z = well_uint(1 << 30);
                z >>= 0;
                if ((z & 1) == 0) bmp.SetPixel(x, y, Color.White);
                else bmp.SetPixel(x, y, Color.Black);
            }
        }
        bmp.Save(xmax.ToString() + "W.bmp");
    }
}

2013/11/17

Uniform random numbers in a range, part 2

/*

Source code of System.Random is available at: CodePlex/Random.cs

System.Random is fast, basically a new random number is generated by a subtraction
of two numbers. A straightforward copy of the source code is two times slower 
compared to using rand.Next directly.
A way to speed it up: the codeplex code uses a multiplication by a double type. 
The multiplication can be replaced by a shift  and an "if" statement when 30 
random bits are needed, very occasionally an addition  "+ 1" (and an "& 1") are
necessary, that's what "Next30()" does in, voila:

    Fast Random Range (CodePlex)

Other improvements over the previous version:
-a 64 bits buffer, instead of a 32 bits one.
 Operations like "uint + uint" are as fast as "ulong + uint", 
 same for: " -  , ^ , | , << , >> " 
-Random numbers were computed bit by bit, now the bits are taken at once,
 some bit twiddling recycles unused random bits. 

Finally: A probably very true scaled (pseudo) random number generator, 
based on the .Net4 RNGCryptoServiceProvider class:
".. rng = new  RNGCryptoServiceProvider".
The "rng.Getbytes(..)" method fills a byte array, 
one byte as fast as 40 bytes, 40 bytes seems to be an optimum.
"Buffer.BlockCopy" copies the byte array to a uint array.
The uint array is used to fill a 64 random bits buffer, etc.
"uint x = g_uint_c(uint u)" gives a random value for x,
such that: 0 <= x <= u <= uint.MaxValue


Results:
    
    10^7 times rand.Next() takes: 107 ms (~11 ns per random number) 
    10^7 times rand.Next(int someValue) takes: 236 ms (~24 ns ...)
                                                 
    |--------|-------------------------|         rand.Next()    11 ns  
    |        |       ms 10^7 *         |         rand.Next(i)   24 ns
    |        |-----------|-------------|         gen_uint(u)    50 ns, mean  ~35 ns
    |      u |gen_uint(u)| g_uint_c(u) |         g_uint_c(u)   280 ns, mean ~250 ns   
    |--------|-----|-----|------|------|
    |      0 |  24 |     |   24 |      |          
    |      1 |     |  73 |      |  139 |          
    |      2 | 124 |     |  296 |      |           
    |      3 |     |  78 |      |  211 |       
    |      4 | 242 |     |  518 |      |     
    |      7 |     |  83 |      |  282 |               
    |      8 | 290 |     |  680 |      |      
    |  2^7-1 |     | 102 |      |  562 |              
    |  2^7-0 | 380 |     | 1079 |      |          
    | 2^15-1 |     | 130 |      | 1123 |          
    | 2^15-0 | 414 |     | 1634 |      |               
    | 2^30-1 |     | 193 |      | 2164 |
    | 2^30-0 | 494 |     | 2672 |      |
    | 2^31-1 |     | 210 |      | 2251 |
    | 2^31-0 | 461 |     | 2747 |      |
    | 2^32-1 |     | 211 |      | 2307 |
    |--------|-----|-----|------|------|


http://www.cacert.at/cgi-bin/rngresults

find: "P_P" , hover mouse over "Net4 System.Random" for the code.
 Random ~ gen_uint(u) 
    CSP ~ g_uint_c(u)

|--------|---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
| Gene-  |  Speed  | Upload | Entropy  | Birthday | Matrix | 6*8 Ma- | Minimum  | Random   | Squeeze  | Overlap- |
| rator  | (MiB/s) |  Size  | (->8)    | Spacing  | Ranks  | trix R  | Distance | Spheres  |          | ping Sums|
|--------|---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
| Random |   370   | 17 MiB | 7.999989 | 0.000000 |  0.392 |  0.136  | 0.063920 | 0.905097 | 0.098440 | 0.005467 |
|    CSP |    34   | 15 MiB | 7.999987 | 0.032102 |  0.084 |  0.942  | 0.894277 | 0.036221 | 0.706538 | 0.023523 |
|--------|---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
   
*/

using System;
using System.Diagnostics;
using System.Security.Cryptography;

class scaled_random_uint
{
    private static ulong buf = 0;              // 64 random bits buffer
    private static int av = 0;                 // available bits in buf
    private static uint g_u = 0;               // copy of u
    private static int g_nd = 0;               // nd (needed) bits = highest set bit of u
    private static int g_s = 0;                // shift
    private static bool ones = false;          // true if u = 10, 110, 1110, ...

    private static uint g_uint_c(uint u)
    {
        uint x; int s;
        if (u == 0) return 0;
        if (u != g_u)
        {
            g_u = u;
            g_nd = nob(u);
            g_s = 64 - g_nd;
            ones = (u + 1 & u + 2) == 0;
        }
    L0: if (av < g_nd) 
        {
            av += 32;
            buf <<= 32;
            buf |= next32();
        }
        x = (uint)buf << g_s >> g_s;
        buf >>= g_nd;
        av -= g_nd;
        if (x <= u) return x;
        if (ones) goto L0;                     // faster when u = 2, 6, 14, ...
        s = nob(x ^ u) - 1;
        buf <<= s;
        buf ^= x;
        av += s;
        goto L0;
    }

    private static RNGCryptoServiceProvider
        rng = new RNGCryptoServiceProvider();
    private static byte[] r08 = new byte[40];       //  ... [400] ... ~15% faster
    private static uint[] r32 = new uint[10];       //  ... [100] ...  ,,    ,,
    private static int i_r32 = 10;                  //  ... = 100 ...  ,,    ,, 
                                                               
    private static uint next32()                                  
    {                                                             
        if (i_r32 != 10) return r32[i_r32++];       //  ... != 100...  ,,    ,,
        i_r32 = 1;                                                
        rng.GetBytes(r08);                                       
        Buffer.BlockCopy(r08, 0, r32, 0, 40);       //  ... , 400)...  ,,    ,,
        return r32[0];
    }

    private static int nob(uint u) // number of bits, position of the highest set bit, u > 0  
    {
        return
        u < 1u << 16 ? u < 1u << 08 ? u < 1u << 04 ? u < 1u << 02 ? u < 1u << 01 ? 01 : 02 :
                                                                    u < 1u << 03 ? 03 : 04 :
                                                     u < 1u << 06 ? u < 1u << 05 ? 05 : 06 :
                                                                    u < 1u << 07 ? 07 : 08 :
                                      u < 1u << 12 ? u < 1u << 10 ? u < 1u << 09 ? 09 : 10 :
                                                                    u < 1u << 11 ? 11 : 12 :
                                                     u < 1u << 14 ? u < 1u << 13 ? 13 : 14 :
                                                                    u < 1u << 15 ? 15 : 16 :
                       u < 1u << 24 ? u < 1u << 20 ? u < 1u << 18 ? u < 1u << 17 ? 17 : 18 :
                                                                    u < 1u << 19 ? 19 : 20 :
                                                     u < 1u << 22 ? u < 1u << 21 ? 21 : 22 :
                                                                    u < 1u << 23 ? 23 : 24 :
                                      u < 1u << 28 ? u < 1u << 26 ? u < 1u << 25 ? 25 : 26 :
                                                                    u < 1u << 27 ? 27 : 28 :
                                                     u < 1u << 30 ? u < 1u << 29 ? 29 : 30 :
                                                                    u < 1u << 31 ? 31 : 32;
    }

    private static Stopwatch sw = new Stopwatch();
    static void Main()
    {
        speed((1u << 31) - 0); 
        Console.ReadLine();
    }

    private static void speed(uint u)
    {
        int i, j = 10000000; double t;
        sw.Restart();
        for (i = 0; i < j; i++) g_uint_c(u);
        sw.Stop();
        t = sw.ElapsedMilliseconds;
        sw.Restart();
        for (i = 0; i < j; i++) ; // nada
        sw.Stop();
        t -= sw.ElapsedMilliseconds;
        Console.WriteLine("u: " + u);
        Console.WriteLine("gen_uint(u)");
        Console.WriteLine("     ms: " + t);
        Console.WriteLine("   bits: " + (double)(nob(u)) * j);
        Console.WriteLine(" bits/s: {0:.000}", (double)((nob(u)) * j) / t / 1000000);
        Console.WriteLine();
    }
}

2013/11/04

Fast Cube Root ( < 2^32 ), part 2

/*
The mean time to take a cube root from a 32 bits uint was: < 13 ns
                                            new mean time: < 12 ns

It is a bit faster, especially for large values, because variables are initialized properly.
       
       |--------------------------------|
       |         cro32(x)               |       
       |-----|-----||------------|------|       
       |   x |  ns ||          x |   ns |       
       |-----|-----||------------|------|
       |   0 | 3,0 ||        511 |  4,6 |       
       |   1 | 2,3 ||       1023 |  5,3 |       
       |   3 | 2,3 ||       2047 |  5,6 |               ___
       |   7 | 2,3 ||       4095 |  5,0 |           \3 /                
       |  15 | 3,3 ||      32768 |  7,6 |            \/  x = y          
       |  31 | 2,6 ||    1048576 |  9,0 |          
       |  63 | 2,6 ||  268435456 | 11,0 |           Input: 0 <= x < 2^32
       |  64 | 5,0 ||  536870912 | 11,3 |          Output: y, such that y^3 <= x < (y+1)^3
       |  65 | 5,0 || 1070599167 | 13,0 |       
       | 100 | 5,0 || 1070599168 | 13,0 |
       | 127 | 5,0 || 1073741824 | 11,3 |
       | 128 | 5,3 || 1073774592 | 11,3 |
       | 255 | 4,6 || 2147483648 | 11,0 |
       | 256 | 4,6 || 4294967295 | 11,7 |
       |-----|-----||------------|------|
       
       4294967295 roots in 50686 ms, mean time 11,80 ns (before 12,55 ns)
       
Another way to find roots is using a look-up table. Below "fill_cubes" fills a "cubes" array,
"cro32t" does a binary search for a root. Mean time: 30 ns.
I have not tried to optimize it further, I see no reason why it should be much faster.
*/

using System;
using System.Diagnostics;
class cube_root
{
    private static uint cro32(uint x)
    {
        uint y = 4u, z = 16u, b = 61u << 21;
        if (x < 1u << 24)
            if (x < 1u << 12)
                if (x < 1u << 06)
                    if (x < 1u << 03)
                        return x == 0u ? 0u : 1u;
                    else
                        return x < 27u ? 2u : 3u;
                else
                    if (x < 1u << 09) goto L8; else goto L7;
            else
                if (x < 1u << 18)
                    if (x < 1u << 15) goto L6; else goto L5;
                else
                    if (x < 1u << 21) goto L4; else goto L3;
        else
            if (x < 1u << 30)
                if (x < 1u << 27) goto L2;
                else
                {
                    if (x >= 27u << 24) { x -= 27u << 24; z = 36u; y = 6u; b = 127u << 21; }
                    else { x -= 1u << 27; }
                }
            else
            {
                if (x >= 27u << 27) { x -= 27u << 27; z = 144u; y = 12u; b = 469u << 21; }
                else
                {
                    if (x >= 125u << 24) { x -= 125u << 24; z = 100u; y = 10u; b = 331u << 21; }
                    else { x -= 1u << 30; z = 64u; y = 8u; b = 217u << 21; }
                }
            }
        goto M1;

    L2: if (x >= 27u << 21) { x -= 27u << 21; z = 36u; y = 6u; } else { x -= 1u << 24; } goto M2;
    L3: if (x >= 27u << 18) { x -= 27u << 18; z = 36u; y = 6u; } else { x -= 1u << 21; } goto M3;
    L4: if (x >= 27u << 15) { x -= 27u << 15; z = 36u; y = 6u; } else { x -= 1u << 18; } goto M4;
    L5: if (x >= 27u << 12) { x -= 27u << 12; z = 36u; y = 6u; } else { x -= 1u << 15; } goto M5;
    L6: if (x >= 27u << 09) { x -= 27u << 09; z = 36u; y = 6u; } else { x -= 1u << 12; } goto M6;
    L7: if (x >= 27u << 06) { x -= 27u << 06; z = 36u; y = 6u; } else { x -= 1u << 09; } goto M7;
    L8: if (x >= 27u << 03) { x -= 27u << 03; z = 36u; y = 6u; } else { x -= 1u << 06; } goto M8;

    M1: if (x >= b) { x -= b; z += y * 2 + 1u; y += 1u; } y *= 2; z *= 4;
    M2: b = (y + z) * 3 + 1u << 18; if (x >= b) { x -= b; z += y * 2 + 1u; y += 1u; } y *= 2; z *= 4;
    M3: b = (y + z) * 3 + 1u << 15; if (x >= b) { x -= b; z += y * 2 + 1u; y += 1u; } y *= 2; z *= 4;
    M4: b = (y + z) * 3 + 1u << 12; if (x >= b) { x -= b; z += y * 2 + 1u; y += 1u; } y *= 2; z *= 4;
    M5: b = (y + z) * 3 + 1u << 09; if (x >= b) { x -= b; z += y * 2 + 1u; y += 1u; } y *= 2; z *= 4;
    M6: b = (y + z) * 3 + 1u << 06; if (x >= b) { x -= b; z += y * 2 + 1u; y += 1u; } y *= 2; z *= 4;
    M7: b = (y + z) * 3 + 1u << 03; if (x >= b) { x -= b; z += y * 2 + 1u; y += 1u; } y *= 2; z *= 4;
    M8: return x <= (y + z) * 3 ? y : y + 1u;
    }

    private static uint[] cubes = new uint[2048];
    private static void fill_cubes()
    {
        int i = 0; uint a = 0u, b = 1u, c = 6u;
        do
        {
            cubes[i++] = a;
            a += b; b += c; c += 6u;
        }
        while (i < 1626);
        do cubes[i++] = ~0u;
        while (i < 2048);
    }
    private static uint cro32t(uint x)
    {
        uint i = 1u << 10, j = 1u << 9, u = 1u << 30;
        for (; j > 0; j >>= 1)
        {
            if (x < u) i -= j;
            else if (x > u) i += j;
            else return i < 1625u ? i : 1625u;
            u = cubes[i];
        }
        return x < u ? i - 1 : i;
    }

    private static Stopwatch sw = new Stopwatch();
    static void Main(string[] args)
    {
        cro32_all();        // ~1 minute, comment out "cro32t" , uint[] cubes, etc.
                            //  Or try seperate executables.

        fill_cubes();
        cro32t_all();       // ~2 minutes
        Console.ReadLine();
    }
    private static void cro32_all()
    {
        uint x; double t;
        cro32(~0u);
        sw.Restart();
        for (x = 0; x < ~0u; x++) cro32(x);
        sw.Stop();
        t = sw.ElapsedMilliseconds;
        sw.Restart();
        for (x = 0; x < ~0u; x++) ; // nada
        sw.Stop();
        t -= sw.ElapsedMilliseconds;
        Console.Write(x + " roots in " + t + " ms, ");
        Console.WriteLine("mean time {0:.00} ns", t * 1000000 / x);
        // 4294967295 roots in 50686 ms, mean time 11,80 ns
    }
    private static void cro32t_all()
    {
        uint x; double t;
        cro32t(~0u);
        sw.Restart();
        for (x = 0; x < ~0u; x++) cro32t(x);
        sw.Stop();
        t = sw.ElapsedMilliseconds;
        sw.Restart();
        for (x = 0; x < ~0u; x++) ; // nada
        sw.Stop();
        t -= sw.ElapsedMilliseconds;
        Console.Write(x + " roots in " + t + " ms, ");
        Console.WriteLine("mean time {0:.00} ns", t * 1000000 / x);
        // 4294967295 roots in 129254 ms, mean time 30,09 ns
    }
}

2013/10/19

Uniform pseudo random numbers in a range

/*

Random number generators (RNG) are divided in two categories,
hardware RNG, that provide “true” random numbers, and algorithmic RNG, that generate
pseudo random numbers (PRNG). Both types usually generate random numbers "X" as independent
uniform samples in a range 0, . . . 2^b − 1, with b = 8, 16, 32 or b = 64. In applications,
it is instead sometimes desirable to draw random numbers as independent uniform samples "U"
in a range " 1, . . . M ", where moreover "M" may change between drawings.
Transforming "X" to "U" , or many "X" to many "U" is sometimes known as scaling.

Some text is copied from other sources. You find them in "References".
I found it quite hard to explain some of the underlying ideas.
I am grateful to the original authors. Their words saved me a lot of time.

Limits:

    0 <= M <= 2^32 - 1 = uint.MaxValue  
    0 <= U <= M

UPDATE: part 2 

Example: 
Suppose a PRNG produces one random bit each time it is called, and we have: M = 5 ,
binary: M = 101. Call the generator three times to get three successive bits. 
Result: 0 <= U <= 7, binary: 000 <= U <= 111. If "U > M" , than we take the next three bits
from the generator, and repeat to do so until "U <= M".

Another way to find a valid value for "U".
-Take the first, the most significant, bit from the generator.
If it's zero, take two other bits from the generator,
and return a valid value for "U": 000 or 001 or 010 or 011.
If the first bit isn't zero, so it's one, 
take the second bit from the generator.
If the second bit is zero, take the third bit from the generator,
and return a valid value for "U": 100 or 101.
Finally: the first and the second bit are both set,
no need to take a third bit from the generator,
a random bit is saved, we have to try again. 
-Take the first, the most significant, bit from the generator...

              /  U = 0..  return 000 or 001 or 010 or 011
             /
         ---<    U = 10.  return 100 or 101
        |    \ 
        |     \  U = 11   --try again--
        |                              |
        |                              |
         ----<<--------<<--------<<---- 

The PRNG used: .Net 4.0 System.Random
It might have a bug: The period is not guaranteed to be longer than: 2^55-1
                     (I do have some doubts, but no proof)
 
Some important properties of System.Random:

    Random Constructors:

        Random()          Initializes a new instance of the Random class, using a time-dependent value.
                   
        Random(int seed)  Initializes a new instance of the Random class, using the specified seed.

    To improve performance, create one Random object to generate many random numbers over time,
    instead of repeatedly creating new Random objects to generate one random number, so:

        private static Random rand = new Random()

    Random Methods:

        rand.Next()                    Returns a positive random number, greater than or equal to zero,
                                       less than int.MaxValue

        rand.Next(int max)             Returns a positive random number, greater than or equal to zero,
                                       less than max ( max >= 0 ) , ........ rand.Next(0) returns zero 
 
        rand.Next(int min, int max)    Returns a positive random number, greater than or equal to min,
                                       less than max ( max >= min ) , ...... 

    rand.Next() is about twice as fast as rand.Next(int max) , 
    rand.Next(int max) is almost as fast as rand.Next(int min, int max),
    especially when you use something like: return min + rand.Next(max - min)

    It's impossible to create more than 30 random bits at once.
    To create those 30 bits, use: rand.Next(1 << 30)  

Various sources warn about the lower bits of numbers generated by PRNG's, they shouldn't be random.
One way to examine a PRNG is to create a visualisation of the numbers it produces.
Humans are really good at spotting patterns, and visualisation allows you to use your eyes and 
brain directly for this purpose. While you shouldn't consider this type of approach an exhaustive 
or formal analysis, it is a nice and quick way to get a rough impression of a given generator's
performance. The background of the image at the top looks pretty random, no recognizable patterns.
It was created with the code at the very bottom: "random_bitmap()"

Description of the code:

gen_nd_bits(int nd) returns the needed amount of random bits. It uses a 32 bits buffer.
The buffer is replenished when necessary.

gen_uint_fast(uint u) returns a random value between zero (incl.) and u (incl.)
It calls gen_nd_bits(int nd) . When the value is too large another call is done.

gen_uint_eff(uint u) uses random bits more efficiently.
It works bit by bit, from most to least significant bit. When it becomes clear that the (partial)
formed value is smaller than the upper limit ( uint u ) , remaining needed random bits are
added relatively fast. When the (partial) formed value is too large another call is done.

On my trustworthy PC the average (full range) results are:

    |-----------------------|---------------|
    |     function          | * 10^9 bits/s | 
    |-----------------------|---------------|
    |    rand.Next()        |      2.9      | <-- 1 CD in 2 seconds!
    |    rand.Next(1 << 30) |      1.5      |  
    |   gen_nd_bits(int nd) |      1.2      | 
    | gen_uint_fast(uint u) |      0.9      | 
    |  gen_uint_eff(uint u) |      0.7      |
    |-----------------------|---------------|

The efficiency, random bits used per bit: 

      gen_uint_fast(uint u)   1.386 bits/bit, remarkably close to ln(4)
       gen_uint_eff(uint u)   1.044 bits/bit

References:

    José's Daylight Dices 
    RANDOM.ORG 
    Ask Dr. Math, Probability and Random Numbers
    Bit recycling for scaling random number generators
    Math.NET Numerics
    MSDN
    Bug ? System.Random (january 2011)

*/

using System;
using System.Drawing;                          // add a ref!
class scaled_random
{
    private static Random rand = new Random();
    private static uint buf = 0;               // random bits buffer
    private static int av = 0;                 // available bits in buf
    private static uint g_u = 0;               // copy of u
    private static int g_nd = 0;               // copy of nd (needed) = highest set bit of u

    private static uint gen_nd_bits(int nd)    // generate nd bits, 0 <= nd <= 32
    {
        uint x;
        if (nd == 0) return 0;
        if (av <= 2)                           // buffer should contain at least 2 bits, other-
        {                                      // wise there may be too few bits when nd > 30 
            buf <<= 30;
            buf |= (uint)rand.Next(1 << 30);
            av += 30;
        }
        if (av < nd)
        {
            x = buf;
            buf = (uint)rand.Next(1 << 30);
            nd -= av;
            x <<= nd;
            x |= buf << 32 - nd >> 32 - nd;
            buf >>= nd;
            av = 30 - nd;
        }
        else
        {
            x = buf << 32 - nd >> 32 - nd;
            buf >>= nd;
            av -= nd;
        }
        return x;
    }

    private static uint gen_uint_fast(uint u)  // 0 <= returned value <= u  
    {
        uint x;
        if (u == 0) return 0;
        if (u != g_u)                          // if u changes, g_nd might change 
        {
            g_u = u;
            g_nd = nob(u);                     // needed: number of bits of u   
        }
    L0: x = gen_nd_bits(g_nd);
        if (x <= u) return x;
        goto L0;
    }

    private static bool ones = false;          // true if u is (binary) 1, 11, 111, 1111, ...
    private static uint gen_uint_eff(uint u)   // 0 <= returned value <= u  
    {
        uint x; int nd;
        if (u == 0) return 0;
        if (u != g_u)
        {
            g_u = u;
            g_nd = nob(u);
            ones = (u + 1 & u) == 0;
        }
        if (ones) return gen_nd_bits(g_nd);
    L0: nd = g_nd - 1;
        if (gen_nd_bits(1) == 0) return gen_nd_bits(nd);
        x = 1u << nd;
        nd--;
        while (nd > 0)
        {
            x |= gen_nd_bits(1) << nd;
            if (x >> nd < u >> nd) return x | gen_nd_bits(nd);
            if (x > u) goto L0;
            nd--;
        }
        x |= gen_nd_bits(1);
        if (x <= u) return x;
        goto L0;
    }

    private static int nob(uint u) // number of bits, position of the highest set bit, u > 0  
    {                              // afaik fastest way in C#
        return
        u < 1u << 16 ? u < 1u << 08 ? u < 1u << 04 ? u < 1u << 02 ? u < 1u << 01 ? 01 : 02 :
                                                                    u < 1u << 03 ? 03 : 04 :
                                                     u < 1u << 06 ? u < 1u << 05 ? 05 : 06 :
                                                                    u < 1u << 07 ? 07 : 08 :
                                      u < 1u << 12 ? u < 1u << 10 ? u < 1u << 09 ? 09 : 10 :
                                                                    u < 1u << 11 ? 11 : 12 :
                                                     u < 1u << 14 ? u < 1u << 13 ? 13 : 14 :
                                                                    u < 1u << 15 ? 15 : 16 :
                       u < 1u << 24 ? u < 1u << 20 ? u < 1u << 18 ? u < 1u << 17 ? 17 : 18 :
                                                                    u < 1u << 19 ? 19 : 20 :
                                                     u < 1u << 22 ? u < 1u << 21 ? 21 : 22 :
                                                                    u < 1u << 23 ? 23 : 24 :
                                      u < 1u << 28 ? u < 1u << 26 ? u < 1u << 25 ? 25 : 26 :
                                                                    u < 1u << 27 ? 27 : 28 :
                                                     u < 1u << 30 ? u < 1u << 29 ? 29 : 30 :
                                                                    u < 1u << 31 ? 31 : 32;
    }

    static void Main()
    {
        test_gen_uint_fast(17);
        test_gen_uint_eff(54);
        test_4_buckets_fast(123456789);
        test_4_buckets_eff(~0u);
        random_bitmap();
    }

    private static void test_gen_uint_fast(uint u)
    {
        int i; int[] c = new int[u + 1];
        for (i = 0; i < 1000000; i++) c[gen_uint_fast(u)]++;
        for (i = 0; i <= u; i++) Console.WriteLine("{0,3}{1,7}", i, c[i]);
        Console.ReadLine(); Console.Clear();
    }
    private static void test_gen_uint_eff(uint u)
    {
        int i; int[] c = new int[u + 1];
        for (i = 0; i < 1000000; i++) c[gen_uint_eff(u)]++;
        for (i = 0; i <= u; i++) Console.WriteLine("{0,3}{1,7}", i, c[i]);
        Console.ReadLine(); Console.Clear();
    }
    private static void test_4_buckets_fast(uint u) // rounding errors!
    {
        uint x; int i, c0 = 0, c1 = 0, c2 = 0, c3 = 0;
        for (i = 0; i < 1000000; i++)
        {
            x = gen_uint_fast(u);
            if (x < u / 4) c0++;
            else if (x < u / 2) c1++;
            else if (x < u / 2 + u / 4) c2++; else c3++;
        }
        Console.WriteLine(c0); Console.WriteLine(c1);
        Console.WriteLine(c2); Console.WriteLine(c3);
        Console.ReadLine(); Console.Clear();
    }
    private static void test_4_buckets_eff(uint u)
    {
        uint x; int i, c0 = 0, c1 = 0, c2 = 0, c3 = 0;
        for (i = 0; i < 1000000; i++)
        {
            x = gen_uint_eff(u);
            if (x < u / 4) c0++;
            else if (x < u / 2) c1++;
            else if (x < u / 2 + u / 4) c2++; else c3++;
        }
        Console.WriteLine(c0); Console.WriteLine(c1);
        Console.WriteLine(c2); Console.WriteLine(c3);
        Console.ReadLine(); Console.Clear();
    }

    private static void random_bitmap()
    {
        int xmax = 900;
        int ymax = 900;
        Bitmap bmp = new Bitmap(xmax + 1, ymax + 1);
        int x, y, z;
        for (x = 0; x <= xmax; x++)
        {
            for (y = 0; y <= ymax; y++)
            {
                z = 7 & rand.Next(1 << 30);
                switch (z)
                {
                    case 0: bmp.SetPixel(x, y, Color.Yellow); break;
                    case 1: bmp.SetPixel(x, y, Color.Orange); break;
                    case 2: bmp.SetPixel(x, y, Color.Purple); break;
                    case 3: bmp.SetPixel(x, y, Color.White); break;
                    case 4: bmp.SetPixel(x, y, Color.Green); break;
                    case 5: bmp.SetPixel(x, y, Color.Black); break;
                    case 6: bmp.SetPixel(x, y, Color.Blue); break;
                    case 7: bmp.SetPixel(x, y, Color.Red); break;
                }
            }
        }
        bmp.Save(xmax.ToString() + ".bmp");
    }
}