11_biol_200_lab_11.Rmd
Notes:
You may want to bring materials from Labs 7 and 9 to use as a reference for RStudio.
Make sure to download the file titled
“crayfishdata-spring2024.csv
” from the link in Section 2A.
Objectives:
Practice basic graphing and statistical analysis in R.
Articulate hypotheses and testable predictions of those hypotheses.
Evaluate the fit between your data and your predictions.
Identify alternative explanations for the predicted patterns.
KEY WORDS: null hypothesis; P-value; statistical significance; continuous variable; categorical variable; linear regression; correlation; dependent variable; independent variable
Last time, we learned about studying crayfish burrows in their natural habitats. Before that, you were introduced to using RStudio to graph and analyze data. Today, we will investigate relationships between active crayfish burrows and their environment using the data we collected last week. Our primary hypothesis was that vegetation type, its maximum height, and soil properties will impact crayfish habitat selection and, thus, the number of crayfish burrows observed.
We have three potential mechanisms that could explain the expected trends and we made the following predictions:
Soil moisture, pH, and temperature influence burrow density. Soil moisture will be positively correlated with number of burrows. For soil pH, burrow density will be highest at neutral pH and decrease with increasing acidity. We would expect that too high a temperature would be related to reduced burrow density as crayfish are ectotherms and need to stay moist to survive.
Vegetation type influences burrow density. Based on previously published studies, we might expect crayfish to prefer more disturbed, early successional vegetation such as the mowed areas of the field.
Soil nutrient availability could influence vegetation type and thus also serves as a factor shaping plant community composition. We predict that soil nutrients (N, P, K) will vary with vegetation type to influence active burrow density.
Finally, as crayfish excavate burrows, the texture of the soil may be particularly important to them. Soil texture varies from loose sandy soils to heavily clay soils. We predict crayfish to prefer soils with substantial clay as those soils may be best for maintaining burrow structure as well as retaining water.
Finally, we also have data collected last spring from the same sites, so we’ll add in a comparison of time of year.
Now that we’ve collected data we can evaluate our predictions. We will conduct several different statistical analyses and graph our data. Having been introduced to R previously, this week we’ll use R to visualize and statistically analyze a new dataset. You may wish to consult your handouts and notes from the last few R labs as you complete this exercise.
As a reminder, we are testing for evidence towards any of the above mechanisms by measuring the correlations between soil and vegetation parameters with the number of active burrows. As water and water-associating vegetation are each known habitats for crayfish, we would therefore expect positive correlations between each and the number of active burrows. In terms of vegetation type, we can compare the average number of burrows by each type classified with box plots. We will use graphical techniques to visualize our results and statistical analyses to test the significance of these relationships.
In the previous R-related lab work, we encountered both categorical and continuous explanatory variables. You may recall that we use different types of graphs and analyses depending on the types of variables. Most of our variables today are continuous, meaning we will mostly be using linear regression and scatterplots for graphing. Remember that for categorical explanatory variables, we will still use ANOVA to analyze the data and boxplots for graphing.
To test a prediction, we will perform a simple linear regression analysis to create a predictive model for the number of burrows as a function of soil temperature. We can then determine whether the correlation between variables is statistically significantly different from zero and quantitatively describe how the number of burrows changes with soil moisture in our dataset.
Simple linear regression analysis allows us to quantify the relationship between active burrow numbers and relative soil moisture using the following:
Regression equation: This is an equation for the line describing burrow number (y) as a function of relative soil moisture (x), of the form \(y = b + mx\) (where b is the y-intercept and m is the slope of the line).
P-value: As in the chi-square analysis we used in an earlier lab exercise, a probability of 5% or lower will be considered statistically significant. Here, our null hypothesis will be ‘there is no association between active burrows and relative soil moisture.’ Thus, the P-value gives us the probability that the slope of the line is not different from zero (which would imply zero relationship between the variables).
R2 value: The R-squared value measures how much of the variation in active burrow numbers is explained by variation in relative soil moisture. This value ranges between zero (no variation explained) and one (100% of the variation is explained).
Keep in mind that for each of our analyses, we’re only going to be working with a subset of all of the data that we collected; we’re only going to be comparing one type of interaction in each individual test. So, we will be conducting a statistical test (regression) and creating a graph for each comparison. Remember, these are the comparisons we can make with our dataset:
Number of active burrows as a function of vegetation type.
Number of active burrows as a function of the relative soil moisture level.
Number of active burrows as a function of the soil pH.
Number of active burrows as a function of soil temperature.
Number of active burrows as a function of soil nutrients (N, P, and K).
Now let’s import that dataset and get analyzing!
First, download the data sheet to your computer using the link below. Make sure you remember where you saved the file on your computer!
Next, you need to upload the data file into RStudio before you can do
anything with it. Just like last time, go to the Files pane and
click Upload. Navigate to the file on your computer, select it,
and click OK to import the file into RStudio. Now, we want to tell R to
read the file and save it to memory as an object we’ll call
craydata
. Type the following into your new script and then
execute it in the Console:
craydata <- read_csv("crayfishdata-spring2024.csv") # assign the file to the object 'craydata'
str(craydata)
into your script (and then executing
it, like you did with the previous command). Once executed, in the
Console window you can see that categorical variables are
called chr
, or character, by R. You also have some
numeric
variables. R is able to treat different types of
variables differently when you are using functions so it’s important to
make sure your variables are interpreted correctly. Tidyverse also
assigns columns as col_double()
when they are numeric.
names(craydata)
.
You will work with the soil temperature data first, examining how soil temperature in a quadrat affects the number of the active crayfish burrows in that quadrat. Recollect that our quadrats were 1 m2, so that the number of active burrows is per square meter (which is burrow density). Because of the way we collected data on soil temp (some teams measured every quadrat while others measured one per transect), we’ll want to average some of our data for each transect (across the 5 quadrats) before proceeding.
Graphing
Before we proceed with any analysis, let’s make a graph to see what the data set looks like. We’ll work with soil temperature first. Before you type in any code, do you remember what relationship you predicted for soil moisture and number of burrows?
Once you recall you prediction, enter the following at the prompt to make the graph. Does it look like the data show the pattern you expected?
Now, that was just considering one variable at a time. We might wonder whether the different vegetation types differed in soil moisture We can easily adjust our code to add that information on the graph:
craydata %>%
ggplot() +
geom_jitter(mapping = aes(x = soil.moisture, y = number.active.burrows, size = 1)) +
guides(size = "none") +
facet_wrap(~ vegetation.type) +
xlab("Soil moisture (0-10)") +
ylab("Burrows per square meter") +
ggtitle("Active burrows as a function of soil moisture")
Regression Analysis
Time to run a simple linear regression and view a summary of the
results. Execute the following code to create the regression, summarize
the results, and save the results as an object
(soil.moisture_mod
):
To view the results summary, including P-values for the
intercept and slope (the Estimate for ‘soil.moisture’) and the
R-squared value, enter at the prompt:
soil.moisture_mod
soil.moisture_mod
##
## Call:
## lm(formula = number.active.burrows ~ soil.moisture, data = craydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2042 -1.5179 -0.5608 1.3534 6.4821
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.37578 0.56218 4.226 8.52e-05 ***
## soil.moisture -0.08579 0.07163 -1.198 0.236
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.856 on 58 degrees of freedom
## Multiple R-squared: 0.02414, Adjusted R-squared: 0.007311
## F-statistic: 1.435 on 1 and 58 DF, p-value: 0.2359
Your results should look something like above. From the output, we should notice a few things.
The Estimate column contains the value of the y-intercept,
(Intercept) = 2.3758
burrows per square meter
The slope is the Estimate value for avg.soiltemp
:
-0.0858 burrows for every unit increase in soil temperature.
For the R-squared (R2) value, we use the
Adjusted R-squared: 0.0073
or approx. 8 percent
The P-value for our slope is listed both in the
Coefficients table under Pr(>|t|)
AND on the last line
of the output, after the F-statistic. P =
0.2359
With that information, we can write the regression equation! For the above output that looks like (using 2 significant digits and rounding):
\[Y = 2.37 − 0.08X\]
Is the slope significantly different from zero in this example?Updating your graph with analysis results
We can add the regression line, equation, P-value, and R-squared to a plot by executing the following several lines of code:
craydata %>%
ggplot(mapping = aes(x = soil.moisture, y = number.active.burrows)) +
geom_jitter() +
xlab("Soil moisture (0-10)") +
ylab("Burrows per square meter") +
ggtitle("Active burrows as a function of soil moisture") +
geom_abline(slope = -0.08579, intercept = 2.37578) +
annotate("text",
label="Y = 2.37 - 0.08X, R-squared = 0.007, P = 0.23",
x = 5.5, y = 7, size = 3, color = "black")
REMEMBER TO CHANGE THE INFO FOR THE SPECIFIC DATA THAT YOU ARE GRAPHING!
And round all your values to 2 decimal places.Other Analyses
Using the above instructions as an example, write the code to perform two more analyses, each using a different explanatory variable. You may choose which explanatory variables to use but of course your response variable will be number of burrows for both analyses.
For each of the two analyses you will need to:
Make the plots to visualize the data.
Create the models, using a new name for each analysis to differentiate which variables are used in each.
Remake the plots with the addition of the statistical information from the models.
Export the finalized plot, so you can include it in the assignment you will submit.
Refer to the predictions listed earlier in the handout for a reminder of your options for explanatory variables.
Remember that for a categorical explanatory variable you will use
geom_boxplot
instead of geom_jitter
. On the
graph, you can report the P-value alone for the ANOVA (you do
not need to run the Kruskal Wallis test), you do not need to add
additional text. And of course you do not need the
geom_abline
for a boxplot either. See your earlier R labs
for examples of boxplot code.
Write up your responses to the following questions.
Did you find data bearing on how aspects of the environment might relate to crayfish habitat preferences? Respond to this question by composing a “Results” section discussing your findings, similar to this section in a published research paper. Begin by restating your hypothesis for this study and the mechanism(s) you focused on.
Then clearly state the predictions tested in each of the three analyses you ran. For each prediction tested: (a) briefly discuss the results including the relevant statistical data and (b) insert pictures of the appropriate graphs with informative captions.
One by one, address whether each test supported the hypothesis and mechanisms, and if so, how strong the evidence was.
Make your graphs as professional as you can – no typos, include labeled axes with units, a caption, include the regression equation and line, R-squared value, and P-value. Values should be to two decimal places.Make sure all group members sign the Honor Code (or type their names in). This will indicate your affirmation that everyone individually completed the R analysis and that everyone contributed meaningfully to the group discussions today and to completion of the assignment.
If you will not be able to submit the assignment before leaving lab today, speak with your instructor.