# Methods

If one is merely concerned with the “important” (no more than 2 standard deviations from the mean) portion of a hypergeometric distribution function, it is approximated well by a binomial distribution if the number drawn is ≤ 0.1N (for an error of 5%). If the usual criterion for approximation of binomial with a Gaussian is also met, then the hypergeometric can be approximated with a Gaussian. Remember, the accuracy of this is the WORSE of the two approximations from which it comes.

If the “tails” of the distribution matter also, the issue is somewhat more complex. Let us suppose that we begin with k black balls and N-k white balls in a box. We then start drawing without replacement. After D have been drawn, d of which are black (and thus D-d are white), the number of black balls remaining in the box is k-d, and the number of white balls left is (N-k)-(D-d), or (N-D)-(k-d). The probability of then drawing a black ball is (k-d)/(N-D). Since 0≤ d≤ D, the two extreme cases are if d = 0, and if d= D. If d = 0, this becomes k/(N-D), and if d = D this becomes (k-D)/(N-D). With algebraic manipulation we get (k/N) + kD/[N(N-D)] and 1-{(N-k)/N + [(N-k)D]/[N(N-D)]}, respectively. These are much like the binomial probabilities, with the last term of each being the difference and thus the error. The fractional error ε is thus maximally D/(k-D), or roughly, for small errors, D/k. But since we have made D drawings, the errors are cumulative. Because the probability is the product of those for each drawing, (1+total fractional error) ≤ (1 + ε)^D. If we calculate the chance of drawing a white rather than a black ball, the same approximation gives D/(N-k) for error. If the product εD is smaller than 0.5, we can say that total fractional error ≈ εD. So for the hypergeometric to binomial approximation to work, both D^2/k and D^2/(N-k) should be smaller than the desired maximum fractional error. Thus if we want a maximum error of 5% and there are 5 drawings, the approximation of hypergeometric with binomial works, if both k and N-k are at least 500.

# Sources

http://www.stat.rice.edu/~dcox/Stat305/Lessons/L0212/node2.html

http://www.stat.uiowa.edu/~jcryer/Hypergeometric.pdf