<?xml version="1.0" encoding="ISO-8859-1" ?>
  <resource>
  <id>6680</id>
  <path>/www/nrich/html/content/id/6680/</path>
  <resourceTypeID>1</resourceTypeID>
  <last_published>2011-02-01T00:00:01</last_published>
  <indexXML>&lt;mdoxml version=&quot;1.0&quot;&gt;
&lt;br&gt;&lt;/br&gt;
&lt;ul id=&quot;stemLinks&quot;&gt;
&lt;li&gt;&lt;a href=&quot;http://nrich.maths.org/6315&quot;&gt;Warm-up&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://nrich.maths.org/6645&quot;&gt;Try this next&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://nrich.maths.org/2048&quot;&gt;Think higher&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://plus.maths.org/content/os/issue55/features/dnacourt/index&quot;&gt;Read: mathematics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://nrich.maths.org/6639&quot;&gt;Read: science&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://nrich.maths.org/6788&quot;&gt;Explore further&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt; &lt;/div&gt;
&lt;br&gt;&lt;/br&gt;
&lt;p&gt;As you may know, DNA is made up of of four different bases:&lt;br&gt;&lt;/br&gt;
-Adenine (A)&lt;br&gt;&lt;/br&gt;
-Cytosine (C)&lt;br&gt;&lt;/br&gt;
-Guanine (G)&lt;br&gt;&lt;/br&gt;
-Thymine (T)&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
Suppose that the bases are randomly distributed along a single strand of the DNA:&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-style: italic;&quot;&gt;i) If my DNA single strand is 10 bases in length, what is the probability that it contains only a single adenine?&lt;/span&gt;&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-style: italic;&quot;&gt;ii) If my DNA single strand is 150 bases in length, what is the probability of a 30% cytosine content?&lt;/span&gt;&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-style: italic;&quot;&gt;iii) If my DNA single strand is 1000 bases in length, what is the probability of getting at least 5 thymines in a row, as least once?&lt;/span&gt;&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-style: italic;&quot;&gt;iv) The human genome is approximated 6 billion bases in length. What is the probability that another individual has the same genetic composition as me?&lt;/span&gt;&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-style: italic;&quot;&gt;v) The bacterial restriction enzyme BamHI cuts DNA at the site GGATCC. If I digest my genome with this enzyme, how many cuts would I expect to occur?&lt;/span&gt;&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
DNA sequencing is a very laborious task, and requires expensive machinery and complicated computational power. DNA fingerprinting is a technique carried out by forensic scientists in order to match a sample of DNA to a number of suspects - this is commonly used in identifying a person from among a number of suspects who may have been at a crime scene.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
However, since the sequencing of the entire human genome is so difficult, a different approach must be adopted: it has been found that most of the human genome is largely identical between individuals, except for single bases which are particularly varied in a population. These single bases occur approximately once among every 1000 bases. By comparing these particular sites between individual
samples of DNA, it is much more rapid to identify to a high degree of accuracy whether the two DNA samples are identical.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-style: italic;&quot;&gt;vi) If approximately 1 in 1000 bases is variable, what is the probability of an individual having the same genetic composition as me?&lt;/span&gt;&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-style: italic;&quot;&gt;vii) How many of these variable sites should be investigated to identify a suspect to 99.99% probability?&lt;/span&gt;&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-style: italic;&quot;&gt;viii) If we remember that DNA occurs as homologous chromosomes, and that these variable sites occur in the same places across a pair of homologous chromosomes, how many of the sites should be investigated such that the probability of a misidentification is smaller than 1 in 1,000,000?&lt;/span&gt;&lt;/p&gt;

&lt;/mdoxml&gt;</indexXML>
  <solutionXML>&lt;mdoxml version=&quot;1.0&quot;&gt;&lt;br&gt;&lt;/br&gt;
This problem makes heavy use of combinatorics:&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;i)&lt;/span&gt; We are asked the
probability of a single adenine among 10 bases. If the adenine were
in the the first base in the sequence, the 9 following bases could
be any of the other three types. Thus the probability of this
is:&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
$$p(ANNNNNNNNNN) =
\left(\frac{1}{4}\right)\left(\frac{3}{4}\right)^9 = 0.0188$$&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
However, it is also possible that the Adenine could have been in
the any of the other positions instead. Thus the probability is
increased tenfold. We can express this possibility of placing the
adenine in multiple places by using the Combinations notation:
$^{10}C_1$ indicates that we wish to place 1 adenine among 10
bases.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
Thus, overall the probability we require is:&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
$$p(one\ adenine) =
^{10}C_1\left(\frac{1}{4}\right)\left(\frac{3}{4}\right)^9 =
0.188$$&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;ii)&lt;/span&gt; A 30% cytosine content
implies the need for 45 cytosines from among the 150 bases.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
Thus,&lt;br&gt;&lt;/br&gt;
$$p(45C) =
^{150}C_{45}\left(\frac{1}{4}\right)^{45}\left(\frac{3}{4}\right)^{105}
= 0.0272$$&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;iii)&lt;/span&gt; We are asked for the
probability that there is at least one chain of at least 5 Thymines
among 1000 bases.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
To tackle this, we must realise that a group of 5 Thymines has 996
possible locations within 1000 bases, and that the remaining 995
bases can be of any sort.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
Thus,&lt;br&gt;&lt;/br&gt;
$$p = ^{996}C_{1}\left({1}{4}\right)^5 = 0.973$$&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;iv)&lt;/span&gt; The probability of an
individual having the same genetic composition as me implies that
their every base must be identical in type and placement as
mine.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
Therefore:&lt;br&gt;&lt;/br&gt;
$$p(same) = \left(\frac{1}{4}\right)^{6,000,000,000} =
\text{exceptionally small!}$$&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;v)&lt;/span&gt; The probability of a
random 6 base sequence of DNA forming GGATCC is
$\left(\frac{1}{4}\right)^6$. If we simplistically say that the 6
billion base-pair human genome is composed of 1 billion different
possible sites, then the number of expected sites with the correct
restriction sequence is:&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
$$\left(\frac{1}{4}\right)^6\times 1,000,000 = 2.44 \times
10^5$$&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;vi)&lt;/span&gt; If only ever 1000 bases
vary across a population, then there are only 6 million variable
sites in the genome. Thus, the probability of an individual being
identical to me is:&lt;br&gt;&lt;/br&gt;
$$ \left(\frac{1}{4}\right)^{6,000,000} = \text{very small}$$&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;vii)&lt;/span&gt; We wish to find the
number of sites necessary for it to be possible to match an
individual to a 99.99% probability to a piece of DNA. Thus, we want
the possibility of the two samples of DNA being the same by chance
as 0.01%.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
$$p = \left(\frac{1}{4}\right)^n = \frac{0.01}{100}$$&lt;br&gt;&lt;/br&gt;
$$n = \frac{ln(10,000)}{ln(4)} = 6.62$$&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
Therefore, at least 7 of the variable sites should be
investigated.&lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
&lt;br&gt;&lt;/br&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;viii)&lt;/span&gt; As before, a
misidentification occurs when the two DNA samples are the same
purely by chance. We want the probability of this happening to be
less than 1 in 1,000,000. However, since the same variable sites
are present in the same place on homologous chromosomes, the
probability of two individuals being identical at both these loci
is $\frac{1}{4} \times \frac{1}{4} = \frac{1}{16}$.&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
$$\therefore \left(\frac{1}{16}\right)^n =
\frac{1}{1,000,000}$$&lt;br&gt;&lt;/br&gt;
$$n = \frac{ln(1,000,000)}{ln(16)} = 4.98$$&lt;br&gt;&lt;/br&gt;
 &lt;br&gt;&lt;/br&gt;
Therefore, at least 5 sites should be investigated.&lt;br&gt;&lt;/br&gt;&lt;/mdoxml&gt;</solutionXML>
  <noteXML/>
  <clueXML/>
  <canonXML/>
  <end_user_role>2</end_user_role>
  <difficulty>3</difficulty>
  <keystage1>0</keystage1>
  <keystage2>0</keystage2>
  <keystage3>0</keystage3>
  <keystage4>0</keystage4>
  <keystage4plus>1</keystage4plus>
  <title>Is your DNA unique?</title>
  <description>Use combinatoric probabilities to work out the probability that you
are genetically unique!</description>
  <spec_group>Applications
    <specifier>biology</specifier>
  </spec_group>
  <spec_group>University and Careers
    <specifier>Applying to university</specifier>
  </spec_group>
  <spec_group>Probability
    <specifier>Probability</specifier>
  </spec_group>
  <spec_group>Applications
    <specifier>biology</specifier>
  </spec_group>
  <spec_group>Admin
    <specifier>Individual</specifier>
  </spec_group>
</resource>