Learning Objectives
Following this assignment students should be able to:
- install and load an R package
 - understand the data manipulation functions of
 dplyr- execute a simple import and analyze data scenario
 
Reading
- 
    
Topics
dplyr
 - 
    
Readings
 - 
    
Optional Resources:
 
Lecture Notes
- Working with Tabular Data (in dplyr)
 - dplyr Aggregation
 - Combining Data Manipulations
 - dplyr Joins
 - Advanced Filtering
 
Exercises
Shrub Volume Data Basics (10 pts)
Dr. Granger is interested in studying the factors controlling the size and carbon storage of shrubs. She has conducted an experiment looking at the effect of three different treatments on shrub volume at four different locations. She has placed the data file on the web for you to download:
Download this into your
datafolder and get familiar with the data by importing it usingread.csv()and then:- Check the column names in the data using the function 
names(). - Use 
str()to show the structure of the data frame and its individual columns. - 
    
Print out the first few rows of the data using the function
head().Use
dplyrto complete the remaining tasks. - Select the data from the length column and print it out.
 - Select the data from the site and experiment columns and print it out.
 - Filter the data for all of the plants with heights greater than 5 and print out the result.
 - Create a new data frame called 
shrub_data_w_volsthat includes all of the original data and a new column containing the volumes, and display it. 
- Check the column names in the data using the function 
 Shrub Volume Aggregation (10 pts)
This is a follow-up to Shrub Volume Data Basics.
Dr. Granger wants some summary data of the plants at her sites and for her experiments. Make sure you have her shrub dimensions data.
This code calculates the average height of a plant at each site:
shrub_dims <- read.csv('data/shrub-volume-data.csv') by_site <- group_by(shrub_dims, site) avg_height <- summarize(by_site, avg_height = mean(height))- Modify the code to calculate and print the average height of a plant in each experiment.
 - Use 
max()to determine the maximum height of a plant at each site. 
Shrub Volume Join (15 pts)
This is a follow-up to Shrub Volume Aggregation.
Dr. Granger has kept a separate table that describes the
manipulationfor eachexperiment. Add the experiments data to yourdatafolder.Import the experiments data and then use
[click here for output]inner_jointo combine it with the shrub dimensions data to add amanipulationcolumn to the shrub data.Portal Data Manipulation (15 pts)
Download a copy of the Portal Teaching Database surveys table and load it into R using
read.csv().Do not use pipes for this exercise.
- Use 
select()to create a new data frame with just theyear,month,day, andspecies_idcolumns in that order. - Use 
mutate(),select(), andna.omit()to create a new data frame with theyear,species_id, and weight in kilograms of each individual, with no null weights. The weight in the table is given in grams so you will need to divide it by 1000. - Use the 
filter()function to get all of the rows in the data frame for the species IDSH. - Use the 
group_by()andsummarize()functions to get a count of the number of individuals in each species ID. - Use the 
group_by()andsummarize()functions to get a count of the number of individuals in each species ID in each year. - Use the 
filter(),group_by(), andsummarize()functions to get the mean mass of speciesDOin each year. 
- Use 
 Portal Data Manipulation Pipes (15 pts)
Download a copy of the Portal Teaching Database surveys table and load it into R using
read.csv().Use pipes (
%>%) to combine the following operations to manipulate the data.- Use 
mutate(),select(), andna.omit()to create a new data frame with theyear,species_id, and weight in kilograms of each individual, with no null weights. - Use the 
filter()andselect()to get theyear,month,day, andspecies_idcolumns for all of the rows in the data frame wherespecies_idisSH. - Use the 
group_by()andsummarize()functions to get a count of the number of individuals in each species ID. - Use the 
group_by()andsummarize()functions to get a count of the number of individuals in each species ID in each year. - Use the 
filter(),group_by(), andsummarize()functions to get the mean mass of speciesDOin each year. 
- Use 
 Fix the Code (15 pts)
This is a follow-up to Shrub Volume Aggregation. If you haven’t already downloaded the shrub volume data do so now and store it in your
datadirectory.The following code is supposed to import the shrub volume data and calculate the average shrub volume for each site and, separately, for each experiment
read.csv("data/shrub-volume-data.csv") shrub_data %>% mutate(volume = length * width * height) %>% group_by(site) %>% summarize(mean_volume = max(volume)) shrub_data %>% mutate(volume = length * width * height) group_by(experiment) %>% summarize(mean_volume = mean(volume))- Fix the errors in the code so that it does what it’s supposed to
 - Add a comment to the top of the code explaining what it does
 
Portal Data Joins (20 pts)
Download copies of the following Portal Teaching Database tables:
Load them into R using
read.csv().- Use 
inner_join()to create a table that contains the information from both thesurveystable and thespeciestable. - Use 
inner_join()twice to create a table that contains the information from all three tables. - Use 
inner_join()andfilter()to get a data frame with the information from thesurveysandplotstables where theplot_typeisControl. - We want to do an analysis comparing the size of individuals on the 
Controlplots to theLong-term Krat Exclosures. Create a data frame with theyear,genus,species,weightandplot_typefor all cases where the plot type is eitherControlorLong-term Krat Exclosure. Only include cases whereTaxaisRodent. Remove any records where theweightis missing. 
- Use 
 
