04_biol_200_lab_4.Rmd
Notes:
Pre-lab Exercise can be found HERE
Bring a calculator to lab.
This lab is adapted from: Bixler and Schee, 2005, “Application of the Hardy-Weinberg model to a mixed population of Bar and wild-type Drosophila. Pages 175-191, in Tested Studies for Laboratory Teaching, Volume 26 (MA O’Donnell, Editor). Proceedings of the 26th Workshop/Conference of the Association for Biology Laboratory Education (ABLE), 452 pages. Your instructors have modified the original exercise for use in this class.
Objectives:
Determine whether a population is in Hardy-Weinberg proportions.
Evaluate which of the five assumptions of the Hardy-Weinberg model may have been violated when a population is not in Hardy-Weinberg equilibrium.
Perform a chi-square test.
KEY WORDS: allele frequency; chi-square test; genetic drift; genotype frequency; Hardy-Weinberg Model; mutation; migration; selection
The genetic diversity of a population can be characterized by the frequency of the alleles they carry for a given gene and the distribution of those alleles into genotypes. The frequency of alleles or genotypes can change from generation to generation. This change in allele and genotype frequency is one of the primary ways that we detect evolutionary genetic changes that occur in populations. Today, we’ll work with one method that scientists use to detect such evolutionary changes.
The Hardy-Weinberg model is a valuable mathematical tool for studying how changes occur to a popula-tion’s gene pool, and is covered in Chapter 19: The Evolution of Populations in the Biology 2e text. The model makes five assumptions about the population it describes:
the population is large in size and genetic drift does not affect allele frequencies for the gene of interest
mutations do not affect allele frequencies for the gene of interest
migration (gene flow) does not affect allele frequencies for the gene of interest
natural selection is not acting on the gene of interest
mating within the population is random with regard to the gene of interest
If these five assumptions are met, we infer that no forces are acting on the population to cause changes in the allele or genotype frequencies of this gene in this population. This conclusion allows us to make predictions about the allele and genotype frequencies we expect to find in the population in the present and future generations.
If we consider a single gene with two alleles (A and a), we can describe the allele frequencies in the population as \(p\) = frequency of the A allele and \(q\) = frequency of the a allele. Given that the above assumptions are met (especially number 5!), we can predict the frequency with which we expect to find each diploid genotype by combining pairs of alleles at random (i.e., according to their frequency in the gene pool):
Expected frequency of the AA genotype = (frequency of finding A as the first allele) x (frequency of finding A as the second allele) which simplifies to:
\[expected~frequency~of~genotype~AA = p × p = p^2\]
Likewise, the expected frequency for the Aa genotype is:
\[expected~frequency~of~genotype~Aa = 2pq\]
And the expected frequency for the aa genotype is:
\[expected~frequency~of~genotype~aa = q^2\]
Because this example has only two alleles, only these three genotypes are possible. Because no other genotypes are possible, we know that the three genotype frequencies must sum to 1:
\[p^2 + 2pq + q^2 = 1\]
This equation may also be expressed as:
\[(p + q)^2 = 1\]
We predict these expected genotype frequencies, if the five
assumptions described earlier hold. In other words,
In this lab, we will examine the Hardy-Weinberg model and see how it can be used to study the factors that affect a population’s evolution. The populations we will study are experimental populations of Drosophila melanogaster. We’ll examine frequencies for two alleles of the Bar gene, which you were introduced to last week. Recall that the Bar gene is sex-linked (on the X chromosome) and shows incomplete dominance, allowing us to determine the genotype of a fly by examining its phenotype. Remember that the wild-type round eye is characteristic of female flies that are wild-type homozygous (X+X+) and males that are wild-type hemizygous (X+Y ); Figure 1a). In contrast, the slit-shaped Bar eye is found in females that are Bar-homozygous (XBXB)and males that are Bar-hemizygous (XBY ; Figure 1c). Heterozygous females (X+XB) have a kidney- or heart-shaped eye (Figure 1b).
A couple of weeks ago, your instructors created several experimental populations of flies with the com-position described in Table 1. After allowing the flies to mate for about a week, we removed the parents and allowed the offspring to continue their development. This week in lab, we’re going to see the results of setting up those identical populations when we phenotype all of the offspring from each of the replicate populations—what do we expect the offspring allele frequencies to be? Their genotype frequencies?
Phenotype |
XbXb |
XbY |
XBXB |
XBY |
---|---|---|---|---|
Table 1: Composition of Parent generation crosses set up by instructors. | ||||
Genotype |
X+ X+ |
X+ Y |
XB XB |
XB Y |
Number of Flies |
5 |
5 |
5 |
5 |
Count the number of flies of each genotype in your experimental population. To do so, you will need to distinguish males from females. You will also need to distinguish heterozygous females with kidney-shaped eyes (XBX+) from wild type females with round eyes (X+X+). (Remember that last week in lab you learned how to distinguish between the sexes.) Record the data in Table 2 below. After you have counted the flies, return them to their original vials.
Once you have the counts for each genotype, calculate the frequency of each sex in the population (we cannot assume it will be 50:50 male:female) and the frequency of each genotype. Note that you will calculate genotype frequency both within each sex (i.e., what percentage of females has each female genotype?) and overall (i.e., what percentage of the total population has each genotype?). Record your results in the table. (Reminder: make sure your genotype frequencies sum to 1!)
Note: Record frequencies in decimal form, with 3 decimal places.
Phenotype |
Genotype |
Number_of_Flies |
Obs_1 |
Obs_2 |
---|---|---|---|---|
Table 2: Observed genotype numbers and frequencies for the first offspring generation (O1) | ||||
Phenotype |
Genotype |
Number of Flies |
Observed genotype frequencies |
|
within-sex |
overall |
|||
Bar female |
XB XB |
|
|
|
Kidney-eyed female |
XB X+ |
_ |
|
|
Wild-type female |
X+ X+ |
__ |
|
|
Bar male |
XB Y |
___ |
|
|
Wild-type male |
X+ Y |
____ |
|
|
num |
line_1 |
freq |
line_2 |
---|---|---|---|
Population ID # = |
____________ |
|
|
Total number of female flies = |
____________ |
Frequency of female flies = |
____________ |
Total number of male flies = |
____________ |
Frequency of male flies = |
____________ |
Total number of flies = |
____________ |
|
|
Work space:
IMPORTANT:
Post your observed genotype numbers and within-sex frequencies on the board when you are finished and be sure to copy down the values for the other populations in the table in the Post-Lab Assignment.
From the information above in Table 2, you can calculate the allele frequencies in your population. Like in the pre-lab assignment, we have to account for the fact that males have only one allele for this gene. Use the equations below to calculate the allele frequencies and enter them in Table 3. Express each proportion as a decimal, with 3 decimal places. Please neatly show your work.
Note: The calculations are done this way because females have two alleles, whereas males only have one (because the gene is on the X chromosome). So for q, the numerator represents all the Bar alleles that occur in homozygous females (2 per female) + the Bar alleles that occur in Bar-eyed males (1 per male), plus the Bar alleles that occur in heterozygous females (1 per female).
As in the pre-lab assignment, let q be the frequency of the Bar allele and p be the frequency of the wild-type allele. Then:
\[q = \frac{\left(\#~Bar~females \times 2~alleles\right) + \left(\#~Bar~males \times 1~allele\right) + \left(\#~heterozygous~females \times 1~allele\right)}{\left(\#~females \times 2\right) + \left(\#~males\right)}\]
\[p = \frac{\left(\#~Wild \text{-} type~females \times 2\right) + \left(\#~Wild \text{-} type~males \times 1\right) + \left(\#~heterozygous~females \times 1\right)}{\left(\#~females \times 2\right) + \left(\#~males\right)}\]
Table 3: Observed allele frequencies in the offspring generation.
Allele |
Frequency in O1 |
---|---|
B |
_ |
+ |
_ |
Now, let’s compare the P generation and the O1 generation.
If these populations are in Hardy-Weinberg equilibrium, what would we predict to be true about the allele frequencies in the O1 generation compared to the allele frequencies in the P generation?
Remember that in the P generation, allele frequencies were \(p = q = 0.5\) for all populations. How do the allele frequencies in your O1 population compare to those of the P generation? Does it look like the allele frequencies have changed over this one generation? Do you think your population is close to what you’d predict if the population is in Hardy-Weinberg equilibrium?
Lack of change in allele frequencies is only one part of the
Hardy-Weinberg model – the other expectation of the HW model is that
Complete the table below with the data that you collected in lab.
Genotype |
exp_feq_2 |
exp_feq_2.1 |
exp_num |
---|---|---|---|
Table 4: Expected genotype frequencies and numbers for O1 flies | |||
Genotype |
Expected genotype frequency |
Expected Number of Flies |
|
within-sex |
overall |
||
XB XB |
|
|
|
XB X+ |
|
|
_ |
X+ X+ |
|
|
__ |
XB Y |
|
|
___ |
X+ Y |
|
|
____ |
Work space:
Now we can do a statistical test to determine whether our observed numbers for the offspring generation differ significantly from the numbers we expected if the population was in Hardy-Weinberg proportions. If the observed numbers of genotypes ARE statistically different from those expected then evolution has occurred! We will use a chi-square (\(\chi^2\)) goodness-of-fit test. Briefly, the test calculates how much difference there is between the observed and expected numbers (NOT the frequencies), accounting for variation in sample size, and we use the \(\chi^2\) test statistic to determine the probability of getting a difference this size just by chance (i.e., if in truth the population is actually in HW proportions). The chi-square test statistic is calculated as follows:
\[\chi^2 = \sum_{}^{} \frac{\left(O - E\right)^2}{E}\]
where,
\(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\chi^2 =\) chi-square, the name of the test statistic
\(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\sum_{}^{} =\) sum of, in our case sum of the mean differences for each genotype
\(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(O =\) the observed number of individuals of a given genotype
\(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(\space\) \(E =\) the expected number of individuals of that genotype
The value that you calculate for the chi-square statistic is interpreted with the help of the chi-square distribution table, which tells us how often we would expect to randomly observe a given chi-square statistic value. To use the table to determine whether our deviation is statistically significant, we first need a little more information about our data: we need to know the degrees of freedom (df) and choose our alpha-level (i.e., significance cutoff).
Degrees of freedom are directly related to the number of variables that you measure in the system you are working on. Each independent variable (meaning each variable whose value does NOT depend on the other variables in the system we are measuring) is equivalent to one degree of freedom. In most systems, however, the values of one or more variables are dependent on the values of others. For example, when measuring genotype frequencies at a given locus (as we are doing today), all genotype frequencies must add up to 1.0. Thus, if we measure the frequencies of Bar-eyed flies and kidney-eyed flies, we can determine the frequency of wild-type flies without actually observing it (since the sum of all three frequencies must be 1.0); given 2 values, we automatically know the third. Because of this, there is only one degree of freedom in this example. In general, the degrees of freedom for chi-square analysis are equal to the number of measured categories minus 1 (because the observed and expected totals must be equal) minus the number of parameters estimated from the data. In our case, the number of categories = 5 (the number of genotypes) and the number of parameters = 2 (the frequency of one allele and the frequency of one sex). Thus, we have 5 – 1 – 2 = 2 degress of freedom.
The alpha-level or P-value sets the cutoff that we use to decide whether observing our deviation is improbable enough that we choose to reject the null hypothesis and therefore conclude that the population is not in HW proportions. The cutoff that we choose is arbitrary, at some level, but it is standard in biology to use P < 0.05. Essentially, this means that even if no evolution has occurred, there is less than a 5% chance of seeing our actual data.
Now that we know our df = 2 and our alpha-level (P < 0.05), we can use the chi-square distribution table to see how likely it would be that we observed our data, if the null hypothesis is true (i.e., the population
is in HW proportions). To do this, we find the row that corresponds to our df and then identify where our chi-square statistic falls relative to the columns for the P-value. If our value is larger than the one in the P = 0.05 column, then we conclude that if no evolution has occurred (the pop is in HW proportions) there is less than a 5% chance of observing this data. As a result, we would suspect that evolution HAS in fact occurred!
Let’s consider a hypothetical example to understand how we interpret
the values in the chi-square distribution table below. If our calculated
chi-square Chi-square distribution table. statistic was 9.2, and we had
df = 2, we would check the table to see that the value of 9.2 falls
between the values for P = 0.05 and P = 0.01. So, in a
population not experiencing any evolution, we would expect to see this
hypothetical data, with a chi-square statistic of 9.2, approximately 1
out of every 100 times we sampled the population (i.e., 1% of the time).
In other words, if we had 100 Drosophila populations that
started with identical genotype frequencies and population sizes, we’d
expect to observe a chi-square of 9.2 or greater in one of the 100
replicate populations, just by chance.
Chi-square distribution table
df |
P = 0.05 |
P = 0.01 |
P = 0.005 |
---|---|---|---|
1 |
3.84 |
6.64 |
10.83 |
2 |
5.99 |
9.21 |
13.82 |
3 |
7.82 |
11.35 |
16.27 |
4 |
9.49 |
13.28 |
18.47 |
For your population, perform the chi-square calculations
Show your chi-square results to your instructor or TA before leaving lab for the day.
Be sure to read sections 1 Overview of the Hardy-Weinberg Model and 2 The study organism and mutation before completing the exercise below.
In lab, we will phenotype (and therefore genotype) the offspring of the parent population just described and then we’ll test whether the populations are in Hardy-Weinberg proportions. As preparation, you will calculate the genotype and allele frequencies for the Parent generation (P)—these values will be the same for all of the replicate populations. Below are the parental generation numbers of adults of each genotype that were used to start each of the replicate populations.
Phenotype |
Genotype |
Number_of_Flies |
Obs_1 |
Obs_2 |
---|---|---|---|---|
Table 5: Observed phenotype/genotype numbers and frequencies for the 5 generation | ||||
Phenotype |
Genotype |
Number of Flies |
Genotype frequency |
|
within-sex |
overall |
|||
Wild-type female |
- |
- |
|
|
Heterozygous female (kidney eyes) |
_ |
_ |
|
|
Bar female |
__ |
__ |
|
|
Wild-type male |
___ |
___ |
|
|
Bar male |
____ |
____ |
|
|
Note: For q, the numerator represents all the Bar alleles that occur in homozygous females (2 per female) + the Bar alleles that occur in Bar-eyed males (1 per male), plus the Bar alleles that occur in heterozygous females (1 per female).
Let q be the frequency of the Bar allele and p be the frequency of the wild-type allele. Then,
\[q = \frac{\left(\#~Bar~females \times 2~alleles\right) + \left(\#~Bar~males \times 1~allele\right) + \left(\#~heterozygous~females \times 1~allele\right)}{\left(\#~females \times 2\right) + \left(\#~males\right)}\]
\[p = \frac{\left(\#~Wild \text{-} type~females \times 2\right) + \left(\#~Wild \text{-} type~males \times 1\right) + \left(\#~heterozygous~females \times 1\right)}{\left(\#~females \times 2\right) + \left(\#~males\right)}\]
Table 6: Allele frequencies of the P generation. Use 3 decimal places!
Allele |
Frequency |
---|---|
XB |
q = |
X+ |
p = |
Next, we can ask: given the frequency of each allele in the P generation, what genotype frequencies would we expect to have seen, if the parent population was meeting the Hardy-Weinberg assumptions? To do this, we will calculate the expected genotype frequencies.
Remember, for a non-sex-linked gene with 2 alleles (A+ and A−), we’d expect:
Genotype |
Expected Frequency |
---|---|
A+ A+ |
p2 |
A+ A- |
2pq |
A- A- |
q2 |
Of course, in our case the Bar gene is sex-linked, so we’ll have to calculate our expected genotype frequencies a little bit differently than this (males have only 1 allele). After we calculate the expected frequencies using the HW relationships, then we also need to multiply those values by the proportion of the population that is male or female (depending on the genotype).
(a) Genotype |
(b) Equation for HW expected frequency (within sex) |
(c) Observed frequency of males or females |
(d) Expected genotype frequency (overall) |
---|---|---|---|
X+ X+ |
|
|
|
X+ XB |
|
|
|
XB XB |
|
|
|
X+ Y |
|
|
|
XB Y |
|
|
|
End of Pre-lab assignment!
Determining which assumptions of HW equilibrium are violated: The table below shows the results of three experiments (A, B, C) that were done exactly the same way as our populations (numbers in parentheses indicate the frequency of genotype by sex).
population |
obs_f_wild |
obs_heart |
obs_f_bar |
females |
obs_wild |
obs_wild.1 |
males |
---|---|---|---|---|---|---|---|
Population |
Observed Numbers by Genotype (frequency within each sex) |
||||||
X+ X+ |
X+ XB |
XB XB |
Total # of Females |
X+ Y |
XB Y |
Total # of males |
|
Experiment A |
86 (0.54) |
74 (0.46) |
0 (0) |
160 |
150 (0.88) |
21 (0.12) |
171 |
Experiment B |
111 (0.56) |
68 (0.35) |
18 (0.09) |
197 |
83 (0.68) |
39 (0.32) |
122 |
Experiment C |
55 (0.47) |
61 (0.53) |
0 (0) |
116 |
62 (0.74) |
22 (0.26) |
84 |
Pop 1 |
|
|
|
|
|
|
|
Pop 2 |
|
|
|
|
|
|
|
Pop 3 |
|
|
|
|
|
|
|
Pop 4 |
|
|
|
|
|
|
|
Pop 5 |
|
|
|
|
|
|
|
Pop 6 |
|
|
|
|
|
|
|
Pop 7 |
|
|
|
|
|
|
|
Pop 8 |
|
|
|
|
|
|
|
Pop 9 |
|
|
|
|
|
|
|
Pop 10 |
|
|
|
|
|
|
|
Pop 11 |
|
|
|
|
|
|
|
Pop 12 |
|
|
|
|
|
|
|
male |
female |
courting |
copulated |
tested |
percent_cop |
---|---|---|---|---|---|
Table 7: Courtship behavior of Bar and wild-type flies. | |||||
Phenotype of fly tested |
Percent time male spent courting |
Copulation success |
|||
Male |
Female |
Pairs copulated |
Pairs tested |
% copulated |
|
Bar |
Bar |
24 |
5 |
54 |
9 |
Bar |
Wild-type |
22 |
5 |
48 |
10 |
Wild-type |
Bar |
59 |
17 |
52 |
33 |
Wild-type |
Wild-type |
68 |
23 |
50 |
46 |
Based on these data, do you think that the mating ability of flies is the same for all genotypes? Briefly explain.
Table 8: Genotype frequencies and numbers from offspring of an F2 cross (X+XB × XBY ).
Genotype |
Frequency (within each sex) |
Number |
---|---|---|
XB XB |
0.47 |
43 |
XB X+ |
0.53 |
49 |
XB Y |
0.50 |
38 |
X+ Y |
0.50 |
38 |