I Impostor: 2018

Sunday, August 12, 2018

Catching Errors While Proofreading a Document

A document has n typos, and there are two proofreaders working independently on proofreading the document. Proofreader A catches typos with probability $p_{1}$ and misses them with probability $q_{1} = 1 - p_{1}$. Proofreader B catches typos with probability $ p_{2}$ and misses them with probability, you guessed it, $q_{2} = 1 - p_{2}$. Let $X_{1}$ be the number of typos caught by Proofreader A, and $X_{2}$ be the number of typos caught by Proofreader B, and $X$ be the number of typos caught by at least one of them.

(a) What is the distribution of X?

Solution: The easiest way to think about this is to think of the case where neither of them catch a typo. The probability of both of them missing a typo, (each Proofreader's miss probability is independent of the other's), is gotten through multiplication, $q_{1}q_{2}$. So, the probability of at least one of them catching that typo is $1 - q_{1}q_{2}$. Thus, the distribution is
$X\sim Bin(n,1-q_{1}q_{2})$

(b) Assume $p_{1} = p_{2}$, what is the conditional distribution of $X_{1}$ given that $X_{1}+X_{2} = t$?

Solution: Using Bayes' Rule, we have:

$P(X_{1}=k|T=t) = \frac{P(T=t|X_{1}=k)P(X_{1}=k)}{P(T=t)}$

with $T\sim Bin(2n,p)$

$P(X_{1}=k|T=t) = \frac{\binom{n}{t-k}p^{t-k}q^{n-t+k}\binom{n}{k}p^{k}q^{n-k}}{\binom{2n}{t}p^{t}q^{2n-t}} = \frac{\binom{n}{t-k}\binom{n}{k}}{\binom{2n}{t}}$

$X_{1}\sim HGeom(n,n,t)$

Notes: $P(T=t|X_{1}=k)$ considers the probability of T = t given $X_{1}=k$, which means we only have to consider how $X_{2}$ affects the calculation. t = total catches by Proofreader A + total catches by Proofreader B. If A catches k typos, then B catches t - k typos.

Saturday, August 11, 2018

Welcome to the Party!

People arrive at a party one at a time. Those who are still waiting for all the other guests to arrive amuse themselves by seeing if they share their birthday with anyone else. Up until the Xth guest arrives, no matches have been made. Let X be the guest who shares a birthday with one of the guests who arrived before. Find P(X = 3 or X = 4).

Solution: First we'd like to find the PMF and then sum the series of terms with X = 3 and X = 4. It's easier to start by considering the first k - 1 guests and use the fact that no one shares a birthday in that group. We could write out an expression that demonstrates this fact easily enough. Let the kth guest be the one who does share a birthday with one of the k - 1 guests.

$P(X\leq k-1) = 365*\left ( \frac{1}{365} \right )*364*\left ( \frac{1}{365} \right )...\left ( 365-k+2 \right )*\left ( \frac{1}{365} \right )$

We have k - 1 guests, so we have k-1 factors in the numerator and k-1 factors in the denominator of the following simplification of the above equation:

$P(X\leq k-1) = \frac{365*364...(365-k+2)}{365^{k-1}}$

When the kth guest shows up, we expect to get a match. There are k-1 ways to get a match with this element, and the birthday probability is given, as usual, as 1/365. So, now we have:

$P(X\geq k) = \frac{365*364...(365-k+2)}{365^{k-1}}*\left ( \frac{k-1}{365} \right )$

P(X = 3 or X = 4) =

$\frac{365*364...(365-3+2)}{365^{3-1}}*\left ( \frac{3-1}{365} \right ) + \frac{365*364...(365-4+2)}{365^{4-1}}*\left ( \frac{4-1}{365} \right ) \approx 0.0136$

It's worth trying this exercise with an application of Bayes' Rule...

$P(X=k(match)|X< k(no-match)) = \frac{P(X<k (no-match)|X=k (match))P(X=k (match))}{P(X< k (no-match))}$

Wednesday, August 8, 2018

Counting Chickens After They Hatch

Given n eggs, let the probability of an egg hatching be p, each hatching event being independent of the others. The probability of a chick surviving after hatching we'll call r, which is independent of the survival of other hatchlings. Let H be the number of eggs that hatch and C be the number of chicks that survive. What are the distributions of H and C?

Solution: H ~ Bin(n, p), which is pretty straightforward. We're given that post-hatch survival is C, with a probability independent of other hatched chicks. Chickens that end up surviving, according to this set up, go through two stages, hatching and surviving. We're also given that the probability of surviving after hatching is r, which is independent of p, the probability of hatching. So, the probability of survival for a single chick is $p * r$. This gives the distribution of C.

$H\sim Bin(n, p)$
$C\sim Bin(n, rp)$

Tuesday, August 7, 2018

Choose the Coin, Then Flip it n Times

Say there are two coins to choose from, and you choose one of them randomly with each one just as likely to be chosen as the other. Then you flip that coin n times with a flip ending up as heads being a success. Say the probability of coin 1 landing heads is $p_{1}$ and the probability of coin 2 landing heads is $p_{2}$. Let X be the number of times a coin lands heads.

(a) What's the Probability Mass Function for this distribution?

We can look at the beginnings of a tree diagram and get an idea of how to proceed.

If we choose Coin 1, we proceed along a path that has nothing to do with Coin 2, and the same logic holds if we choose Coin 2. So, we're looking at combining two cases.

Case 1:

$P_{1}(X=k) = \frac{1}{2}\binom{n}{k}p_{1}^{k}(1-p_{1})^{n-k}$

Case 2:

$P_{2}(X=k) = \frac{1}{2}\binom{n}{k}p_{2}^{k}(1-p_{2})^{n-k}$

Combine the two to get:

$P(X=k) = \frac{1}{2}\binom{n}{k}p_{1}^{k}(1-p_{1})^{n-k} + \frac{1}{2}\binom{n}{k}p_{2}^{k}(1-p_{2})^{n-k}$

Verify this result using the Law of Total Probability (LOTP):

$P(X=k) = P\left ( X=k | C_{1} \right )P(C_{1}) + P\left ( X=k | C_{2} \right )P(C_{2})$

Where $C_{1}$ is the event Coin 1 is chosen and similarly for Coin 2.

$P\left ( C_{1} \right )= P\left ( C_{2} \right )= \frac{1}{2}$

Is This a Valid Probability Mass Function?

The first-digit law, or Benford's Law, states, according to Wikipedia, that in "many naturally occurring collections of numbers the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the most significant digit about 30% of the time, while 9 appears as the most significant digit about 5% of the time."

$P(D=j)= log_{10}\left ( \frac{j+1}{j} \right ), for j\in \left \{ 1,2,3,...,9 \right \}$

with D being the first digit. Is this a valid PMF?

$ \sum_{j=1}^{9}log_{10}\left ( \frac{j+1}{j} \right )= \sum_{j=1}^{9}log_{10}\left ( j+1 \right )-log_{10}\left ( j \right )$

= 1.

Also, the sum is non-negative, so yes, it is a valid PMF.

Wednesday, August 1, 2018

Win, Lose or Draw: 7 Games of Chess

1.) Two chess players, X and Y, play a 7-game series of chess. For each game, there are three possible outcomes for a player, win, lose or draw. A win nets the victor one point and the loser zero points. In the case of a draw, each player is awarded a half-point. How many ways are there for one of the players, say, player X, to get an overall result of 3 wins, 2 draws and 2 losses?

Solution: A simple calculation here... We multiply the number of ways a player can win 3 out of the 7 games, the number of ways that player can get a draw or a loss in 4 of the seven games, and the number of ways that player can get a draw or a loss in 2 out of the 7 games. The factors are:
$\binom{7}{3}, \binom{4}{2}, \binom{2}{2}$

The answer is $\binom{7}{3}* \binom{4}{2}* \binom{2}{2}$ = 210

2. How many possible outcomes for the games are there with X ending up with 4 points and Y ending up with 3 points?

Solution: One way is to have X win 4 out of the 7 games and lose 3: $\binom{7}{4}*\binom{3}{3}$. Another way is to have X win 3, draw 2, and lose 2: $\binom{7}{3}* \binom{4}{2}* \binom{2}{2}$. Another way is to have X win 2, draw 4 and lose 1: $\binom{7}{2}* \binom{5}{4}* \binom{1}{1}$. And, finally, another way is to have X win 1 and draw 6: $\binom{7}{1}* \binom{6}{6}$

Answer = $ \binom{7}{4}*\binom{3}{3} + \binom{7}{3}* \binom{4}{2}* \binom{2}{2} + \binom{7}{2}* \binom{5}{4}* \binom{1}{1} + \binom{7}{1}* \binom{6}{6}$
= 35 + 210 + 105 + 7 = 357.

3. How many possible outcomes for the games are there where one player ends up with 4 points, the other with 3 points, and the full 7 games are played?

Solution: The 7th game must be played, which means that for one player, say, X to end up with 4 points at the end, he or she cannot lose that 7th game. Therefore, the only possibilities for X to end up with 4 points are either a full-point awarded in the seventh game, or only a half point. If the full point is awarded in the 7th game, that means X must get 3 points in the first 6 games. If only a half-point is awarded in the 7th game, then X must get 3.5 points in the first 6 games.

Let scenario 1 be the scenario where X gets a full point in the 7th game. This gives us for the first 6 games:

$\binom{6}{3}* \binom{3}{3}$ = 20 (3 wins and 3 losses)
$\binom{6}{2}* \binom{4}{2}* \binom{2}{2}$ = 90 (2 wins, 2 draws, and 2 losses)
$\binom{6}{1}* \binom{5}{4}* \binom{1}{1}$ = 30 (1 win, 4 draws, and 1 loss)
$\binom{6}{0}* \binom{6}{6}$ = 1 (0 wins and 6 draws)

Let scenario 2 be the one where X gets only a half point in the 7th game. This gives us for the first 6 games:

$\binom{6}{3}* \binom{3}{1}*\binom{2}{2}$ = 60 (3 wins, 1 draw, and 2 losses)
$\binom{6}{2}* \binom{4}{3}*\binom{1}{1}$ = 60 (2 wins, 3 draws, and 1 loss)
$\binom{6}{1}* \binom{5}{5}$ = 6 (1 win and 5 draws)

Answer = 20 + 90 + 30 + 1 + 60 + 60 + 6 = 267

Tuesday, July 31, 2018

Phone Numbers as Possible Combinations of 7 digits: Probability

1.) If the first digit isn't allowed to be a 0 or a 1, how many 7-digit phone numbers are possible?

Solution: For each digit in the phone number, there are 10 possibilities (0 - 9) if there are no restrictions. In this case the first number isn't allowed to be 0 or 1. For phone-number digits 2 - 7, there are $10^{6}$ possibilities: 10 * 10 * 10 * 10 * 10 * 10. Multiply this by the number of possibilities for the first digit (only 8 since two digits, 1 and 0, are excluded), and you get your answer:

No. of possibilities for the first digit * No. of possibilities for the remaining 6 digits

= $8*10^{6}$ = 8,000,000 possible phone numbers.

2.) If the above condition stays in place, along with the added condition that the first three digits of the phone number cannot be 911, how many possible phone numbers are there?

Solution: If the probability of an event A occurring is P(A), then the probability of event A not occurring is 1 - P(A). Consider the case where the event A is the choosing of the first three digits of the phone number to be 911:

The probability space of the first three digits is $8*10*10$ if we consider only the condition of part 1. The probability of the first three digits being 911 is $\frac{1}{8}*\frac{1}{10}*{\frac{1}{10}} = \frac{1}{800}$

So, the probability of the first three digits not being 911 is $ 1 -\frac{1}{800} = \frac{799}{800}$

Finally, the number of possible phone numbers with the first digit not being a 0 or a 1, and the first 3 digits not being 911, is
$799 * 10^{4} $ = 7,990,000.

(799 is the number of possibilities for the first 3 digits, and $10^{4}$ is the number of possibilities for the last 4 digits.)

Sunday, July 22, 2018

0-1 Knapsack Problem: Dynamic Programming

In a previous pos t, we covered a brute-force solution to the knapsack problem. This time we'll make our solution more efficient by using dynamic programming, where we break the original problem down in to smaller sub-problems. If we do this, we can keep the results of the sub-problems stored in an array and just grab them from the cells they occupy as we need them rather than recomputing them.

Consider, intuitively, some sub-problems of the whisky-thief problem we tackled last time. We said our knapsack capacity was 10 bottles (weight = 10). If we consider a different knapsack problem where our knapsack capacity is 2 bottles and we only have two bottles to consider, we can take everything, meaning the two bottles under consideration. This individual case would be like considering a knapsack filled with other bottles and having a remaining capacity of 2; our options are the same. Likewise, if we had a knapsack already filled to capacity, this would be like having a knapsack of zero capacity (like not having a knapsack at all, really); we just couldn't take any bottles in this case. Consider the following array:

V	W		0	1	2	3	4	5	6	7	8	9	10
0	0	0	0
360	2	1
360	3	2
310	1	3
320	4	4
300	2	5

The cell (0, 0) represents the case where we have a knapsack capacity of zero, and no item to be considered. The columns 0 - 10 represent knapsacks of various carrying capacities, 0 being the lowest and 10 being the highest. The rows 0 - 5 represent the items (Scotch whiskies) under consideration. 0 meaning no items to be considered, and 5 meaning the case where we consider all five whiskies. The last cell in the bottom-right corner represents the original knapsack problem that we wish to optimize here. V is the value of an item and W is the weight of the item under consideration. The array we're actually filling in here doesn't include the row with the varying knapsack capacities or the column with the various whisky-item numbers.

V	W		0	1	2	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0	0	0	0	0	0
360	2	1
360	3	2
310	1	3
320	4	4
300	2	5

The cell (0, 1) highlighted above represents the case of a knapsack with a capacity of 1, and no whiskies under consideration (item = 0). You can easily guess that it will be 0.

V	W		0	1	2	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0	0	0	0	0	0
360	2	1	0	0	360	360	360	360	360	360	360	360	360
360	3	2
310	1	3
320	4	4
300	2	5

When we get to cell (1, 2), we consider the case of a knapsack with a carrying capacity of 2, and item 1, which has an item weight of 2. This means we can take it. And, if we do, we have no more room available in this knapsack, which doesn't matter because we have only considered one item so far. From here, it doesn't matter how large we make the knapsack, we'll only ever consider the one item, so each cell gets the same value filled in. The values in the cells represent the total value of the items in the hypothetical knapsacks. Moving on...

V	W		1	2	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0	0	0	0	0
360	2	1	0	360	360	360	360	360	360	360	360	360
360	3	2	0	360	360	360	720	720	720	720	720	720
310	1	3	310	360	670	670	720	1030	1030	1030	1030	1030
320	4	4	310	360	670	670	720	1030	1030	1030	1040	1350
300	2	5	310	360	670	670	970	1030	1030	1330	1330	1350

In the above array, consider cell (2, 2). Here, the items under consideration are item 1 (Macallan 18) and item 2 (Lagavulin 12). In this case, our knapsack capacity is 2, and item 2 weighs 3, so it can't fit. However, we know already that item 1 will fit, so we include it and enter 360 into the cell. We just dragged the value of the cell directly above it down into it. We repeat the process until we consider a knapsack of capacity 5. Now we can insert the Lagavulin (item 2) as well. Or leave it, if we so desire. But, why do that? If we do that, our total value in the knapsack will be the value of item one, which is 360. Compare that to the total value we would get if we took the Lavavulin, which would be 360 + 360 = 720, and we see we'd be foolish not to take it. So we do. We repeat the process until we get to the last cell, and there we see our maximum profit if we take the right combination of whiskies. What that combination is we'll determine later. For now, let's try to make an algorithm that gives us the above solution for the maximum value ($1350)

We'll call our array k[i][w], with i rows and w columns, representing the item numbers and the weights of the knapsacks respectively. Let the weights of the items (in terms of bottles, remember) be given in the list weights = [0, 2, 3, 1, 4, 2], and their values be given in the list values = [0, 360, 360, 310, 320, 300].

for i in range(len(item)):
for w in range(C + 1):
     1. if i == 0 or w == 0:    #case with no bottles or no knapsack (base case)
            k[i][w] = 0

     2.   elif weights[i] > w:
              k[i][w] = ks[i-1][w] #item weight exceeds capacity of knapsack (w)
     3. else:
            p1 = ks[i-1][w]    #case where we don't take ith whisky
          p2 = values[i] + k[i-1][w - weights[i])    #case where we take ith whisky
          k[i][w] = max(p1, p2)    #considers which case gives max profit
return k[i][w]

This step in the algorithm should be self-explanatory.
Look again at our array above at cell (1, 1). Here, we consider the case of a knapsack with carrying capacity 1 (w = 1), and an item with weight 2. In our algorithm, this would be weights[i] = 2, and w = 1. So, since the weight of the item is greater than the knapsack carrying capacity, k[1][1] = 0, which is k[1-1][1]. We've just dragged the value from the cell in the row above (0, 1) down into the cell we're working on (1, 1). Now, look at cell (2, 2), which corresponds to the consideration of item 2, which has a weight of 3 (weights[i] = 3). In this case, the knapsack capacity is w = 2; weights[i] > w, so we drag the value of the cell in the row above down into (2, 2). Or, as our algorithm performs the move, k[2][2] = k[1][2].
Now, take a look at cell (2, 5). Here we make a comparison. The carrying capacity of the knapsack is w = 5, and the weight of the item is item[2] = 3. We calculate p1 = k[1][5] = 360. We could just drag that value down into cell (2, 5) and be done with it, but we shouldn't. We need to calculate the value of p2, the total profit of the knapsack items if we were to take item 2 as well as item 1. This gives us p1 = values[2] + k[1][5 - weights[2]] = 360 + 360 = 720. p1 = value of item 2 + value of item 1. Notice that k[1][5 - weights[2]] = k[1][2], which is the value from the cell where we first stored the value of item 1.

Now, for the combination of items that gives us our max value of $1350. Compare the last two cells at the ends of the rows corresponding to items 4 and 5 (Now, we are considering a knapsack with a carrying capacity of w = 10 bottles).

They both have a value of 1350, so not taking item 5 has no impact on our profit. Leave it out. Now compare the last two cells at the ends of the rows corresponding to items 3 and 4. Including item 4 seems to give a greater value, so we definitely take it. Our remaining capacity is now 10 - 4 = 6. Items 5 and 4 are no longer under consideration (we've taken item 4 and left out item 5), so now we compare values for items 3 and 2. Look at cells (3, 6) and (2, 6). The items under consideration are 3 and 2, with a remaining knapsack capacity w = 6.

Here, we clearly want to take item 3 (total value = 1030). We can repeat the process until we get to

i = 0 and w = 0.

The algorithm goes: (Recall C was the chosen knapsack capacity, in this case 10)

take = []

i = len(item) - 1

w = C

while i > 0 and w > 0:

q1 = k[i][w]

q2 = k[i - 1][w]

if q1 <= q2:

i -= 1

continue

else:

take.append(items[i])

i -= 1

w -= weights[i]

The complete code (in Python):

Don't forget to import numpy.

I used the following inputs to test the function:

whiskies = ['None','Macallan 18', 'Lagavulin 12', 'Springbank 21', 'Ballantines 21', 'Johnnie Walker Blue']
weight = [0,2,3,1,4,2]
value = [0,360,360,310,320,300]

print(ks(whiskies, weight, value, 10))

V	W		0	1	2	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0	0	0	0	0	0
360	2	1	0	0	360	360	360	360	360	360	360	360	360
360	3	2
310	1	3
320	4	4
300	2	5

V	W		0	1	2	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0	0	0	0	0	0
360	2	1	0	0	360	360	360	360	360	360	360	360	360
360	3	2
310	1	3
320	4	4
300	2	5

Pages