Uniform distribution

Impossible?

Puzzle:

This one is (supposedly) from a job interview for Google:

You’re being streamed an array of numbers. Every second you receive the next cell in the array. The array is finite but you don’t know when it ends or what’s it total length. Your job is to randomly pick one of the numbers from the array so that every number has the same probability of being selected i.e. if the array happens to be 10 cells long, each cell will have a probability of exactly 1/10 of being selected.

The catch is that you cannot locally save the array since that might take too much space. How do you do it?

(Courtesy of NewOrder Computer Security and Network Portal)

Potential Solution:

In probability and statistics there are two types of Uniform distribution, discrete and continuous. The above problem is an example of discrete, defined when – in a sequence of variables, a random variable has any of n possible values that are equally probable.

A popular example of discrete uniform distribution is known as the German tank problem. During World War II Allied intelligence used statistical methods to accurately predict production of German tanks using samples of data (ranges of variables over time which formed discrete uniform distribution).

To solve the above problem, the following sample C# code implements a simple algorithm:

First, we create a fraction class. (This is not absolutely necessary but will come in useful later.)

C#:

public class Fraction
    {
        public int numerator;
        public int denominator;

        public Fraction(string F)
        {
            numerator = Convert.ToInt32(F.Split('/')[0]);
            denominator = Convert.ToInt32(F.Split('/')[1]);
        }

        public static Fraction operator +(Fraction F1, Fraction F2)
        {
            int CD = F1.denominator*F2.denominator; //To find common denominator.

            F1.numerator *= CD/F1.denominator;
            F2.numerator *= CD/F2.denominator;

            return new Fraction((F1.numerator + F2.numerator).ToString() + '/' + CD.ToString());
        }

        public static Fraction operator +(Fraction F1, int _F2)
        {
            Fraction F2 = new Fraction(_F2.ToString() + '/' + 1.ToString());
            return F1+F2;
        }

        public static Fraction operator +(int _F1, Fraction F2)
        {
            Fraction F1 = new Fraction(_F1.ToString() + '/' + 1.ToString());
            return F1 + F2;
        }

        public override string ToString()
        {
            return numerator.ToString() + '/' + denominator.ToString();
        }

    }

And now for the main static class:

C#:

static class Program
    {
        static void Main(string[] args)
        {

            int N = 1; //N=Number of values received/generated.
            int ReceivedValue = GetNextNumber();

            if (args.Length > 0) { N = Convert.ToInt32(args[0]); } 

            Console.WriteLine("");
            Console.WriteLine("N={0},Value={1}", N.ToString(), ReceivedValue.ToString());

            Console.WriteLine(SelectNumber(new Fraction(string.Format("1/{0}", N)), ReceivedValue).ToString());

            System.Threading.Thread.Sleep(3000); //Increased timer to make more human readable.

            Main(new string[] {(N + 1).ToString()});
        }

        static int GetNextNumber()
        {
            int ArbitraryValue = 100; //May be any value, dynamic or static.
            Random RGen = new Random();
            int RValue = RGen.Next(1, ArbitraryValue);
            RGen = null;
            return RValue;
        }

        static int SelectNumber(Fraction P, int LastReceivedValue) //P=Probability. 1/N
        {
            int N = P.denominator;

            Random RGen = new Random();
            if (N == 1) { return 1; }
            else
            {
                int RValue = RGen.Next(1, N);
                RGen = null;
                if (!(RValue <= LastReceivedValue)) { return SelectNumber(P, LastReceivedValue); } else { return RValue; };
            }            
        }
    }

To test our algorithm, we must apply the following induction (proof) to assure correctness:

If n = the quantity of numbers arrived so far, for n=1 (the first number arrived), we pick a number with a probability of 1. This is uniform across one number.

For k = n the algorithm uniformly selects a number with probability 1/n.

For k=n+1 if the numbers have a uniform probability of being selected with the algorithm, when a new number arrives we pick it with probability 1/(n+1).

Probability of choosing the number 1 through n: (1/n) * (1-1/n+1) = (1/n)*(n/n+1) = 1/n+1.

Therefore probability of choosing the new number = 1/n+1.

To see it in action, compile and run the above code. Even if an external or different random permutation is used in the “SelectNumber” function, the value must be equal to the previous value to satisfy the probability requirements. Theoretically, an algorithm may be created which, by passing the previous value and probability (1/n+1), with a calculable expected standard deviation, a series of numbers could be randomly generated which are uniformly selected from a series of unknown arbitrary values.

I’m unsure if the above code is actually a pure solution to the problem since logic is applied in “SelectNumber” which causes RGen.Next to loop until the uniform requirements are met. In a true solution, it would seem to me that the value generated from a random algorithm such as “RGen.Next” should immediately satisfy the requirements, or with very minimal repeated loops.

Even though my solution uses the same .NET random generation algorithm for the arbitrary stream “GetNextNumber” as it does for “SelectNumber” (keep in mind this is also strictly not advised by MS), this puzzle does not really state and seems to suggest that these two algorithms should differ.

Without the ability to store locally/cache any values from the arbitrary stream – a pattern, series or algorithm does not seem to be identifiable, and therefore I do not think a random number can be generated in a single selection which is uniform across the stream, although (barring buffer overflows) within X multiple selections (approaching infinity) it is definitely possible.

Additional Info:

In .NET, the Random class actually implements Donald E. Knuth’s subtractive random number generator algorithm. For more information, see D. E. Knuth. “The Art of Computer Programming, volume 2: Seminumerical Algorithms”. Addison-Wesley, Reading, MA, second edition, 1981.

Modern applications of uniform distribution might include integrity testing of random permutation methods, algorithms and streams, (generating sequences of random numbers) which could be useful in encryption/decryption and (probably more useful), stock and financial market prediction.

More on stock market prediction using technical analysis to come soon.

Advertisements

About Ronnie Diaz

Ronnie Diaz is an enterprise software engineer responsible for front-end and back-end development for companies in many industries. Heavily involved in cloud development, online retail, e-commerce and electronic ordering, fulfillment and customer relational systems.

Posted on February 17, 2010, in Math & Programming Puzzles. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: