Module 1: R Studio Basics
R is both a language and computing environment. R is open source and free for anyone to download and use.
Panels
There are four visible panels at a time, each with different purposes (1) The R console accepts commands and produces output to your commands. (2) A text editor window to create and record code. (3) An environment window that shows your stored data in R (4) A plot/help/files window that can alternate to show your plots, help reauests, and file directory
For example, use the text editing window to write the following code:
plot(2,2)
In R Studio, notice that the code is stored in the text editor (script), is passed to the console, and then is plotted in the plot window Also, panels can be resized by dragging the slider
Getting Help
Stuck on a line of code? Not sure what a function does exactly?
Want an example of someone else using that function?
Use the ?
before any function to pull up its help page in the Help
window.
?plot
## starting httpd help server ... done
Executing Commands
Send your code from the text editor/script to the console in one of three ways: (1) Code > Run Selected Line(s) (2) Select Run Icon on top-right of Script Editor box (3) Keyboard Shortcut Mac: Command + Return Windows: Ctrl + Enter
Module 2: More R basics
Scripts
You can have multiple scripts in R Scripts appear as tabs in the text edit panel Save your scripts OFTEN R studio will sometimes crash unexpectedly and you could lose unsaved changes
Annotating
Have you wondered why some of the text in your script is green colored? It’s to annotate your script! Annotations are saved in the text editor window You can use them to section your script and to write notes about what your code is doing. You SHOULD annotate your script, so you can go back potentially YEARS later and still understand what you did (why did I write that code?)
Anything after the HASHTAG becomes annotated, for example:
2+2 #An easy math problem
## [1] 4
2 #Math + 2
## [1] 2
Notice in the second math line above, the “+2” is not added because it occurs after the #
Types of Objects
R data is stored as Objects
Common types of R Objects: Vectors: Single Dimension (a variable or column) Matrices: a data matrix Data Frames: Basically tables Data frames are COMMON and are organized like an excel worksheet. -Variables are columns -Observations are rows
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
?mtcars #mtcars is a pre-loaded dataset always available to anyone in R.
class(mtcars) #Class function is useful to show the class of the object (what type of object?)
## [1] "data.frame"
Other (LESS COMMON) object types: Multiple Dimension - Can think of as tabs in excel file -Lists Other 2 or 3-D object classes include -Arrays -Tables
Creating your own objects
Assign a name to your object using <-
, ->
or =
Most people just use <-
a <- 1 + 2 + 3
a
## [1] 6
5 + 5 -> b
b
## [1] 10
c = log(5)
c
## [1] 1.609438
You can set an R object to equal another object, or as a function of another object
x = a
x
## [1] 6
y = b * 10
y
## [1] 100
The function c()
stands for concatenate and is used to create a vector
Notice you can name your object almost anything
vec <- c(1,2,3,4,5,6)
vec
## [1] 1 2 3 4 5 6
Naming objects IS case sensitive
Useful arithmetic functions in Base R, which we can use for vec
mean(vec)
## [1] 3.5
Also, consider using: min()
; max()
; summary()
Working with a Data Set
Let’s return to mtcars
dataset
The object of the dataset is called mtcars
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
class(mtcars) # Class of dataset -- data frame
## [1] "data.frame"
str(mtcars) # Structure of the data - useful to examine the data set
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
head(mtcars) ## Return first 6 lines of data frame
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
tail(mtcars) ## Return last 6 lines of data frame
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
length(mtcars) #Essentially produces the number of columns
## [1] 11
length(mtcars$mpg) #Essentially produces the number of rows
## [1] 32
dim(mtcars) #The order used in R is always: Rows, Columns (Left to right!)
## [1] 32 11
names(mtcars) ## Return names of all variables
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
In the above code, we ran multiple lines of code to help us understand the structure of the dataset, each with a unique function.
Vectors
We’ve already discussed using the concatenate function c()
to make a vector
new_vec <- c(1,3,5,7)
length(new_vec)
## [1] 4
Now let’s make a vector of actual data, which we can pull from the mtcars
dataset
miles_per_gallon <- mtcars$mpg #The '$' refers to variables (columns) within the dataset
miles_per_gallon
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
Basic Indexing
Ever wonder what the square brackets do in R? []
Wonder no more.
The square brackets are used to index, which means to pull out subsections from objects.
practice_vec <- c(0,10,20,30,40,50,60,70)
practice_vec
## [1] 0 10 20 30 40 50 60 70
indexed_vec <- practice_vec[1:4] #Observations 1 thru 4
indexed_vec
## [1] 0 10 20 30
indexed_vec <- practice_vec[5] #Only the 5th observation
indexed_vec
## [1] 40
indexed_vec <- practice_vec[-5] #Everything but the 5th observation
indexed_vec
## [1] 0 10 20 30 50 60 70
Notice that the object indexed_vec
becomes updated every time we run a new line.
Indexing a vector is straightforward, we will return to indexing for 2-D data frames.
Data Frames
mpg
is another base R dataset.
#install.packages("ggplot2")
library(ggplot2) #We will return to installing and loading packages later in this document.
mpg #It's not quite a data frame yet
## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto~ f 18 29 p comp~
## 2 audi a4 1.8 1999 4 manu~ f 21 29 p comp~
## 3 audi a4 2 2008 4 manu~ f 20 31 p comp~
## 4 audi a4 2 2008 4 auto~ f 21 30 p comp~
## 5 audi a4 2.8 1999 6 auto~ f 16 26 p comp~
## 6 audi a4 2.8 1999 6 manu~ f 18 26 p comp~
## 7 audi a4 3.1 2008 6 auto~ f 18 27 p comp~
## 8 audi a4 quattro 1.8 1999 4 manu~ 4 18 26 p comp~
## 9 audi a4 quattro 1.8 1999 4 auto~ 4 16 25 p comp~
## 10 audi a4 quattro 2 2008 4 manu~ 4 20 28 p comp~
## # ... with 224 more rows
?mpg #What is this data set?
mpg <- data.frame(mpg) #Coerce to a data frame
When you index a data frames, remember to use a comma! When indexing, the first value is for ROWS, and the second is for COLUMNS.
Return the 10th row in 9th variable ‘hwy’
mpg[10,9] #Rows, Columns
## [1] 28
mpg$hwy[10] #Same as above!
## [1] 28
Return all rows of the 2nd variable.
mpg[,2]
## [1] "a4" "a4" "a4"
## [4] "a4" "a4" "a4"
## [7] "a4" "a4 quattro" "a4 quattro"
## [10] "a4 quattro" "a4 quattro" "a4 quattro"
## [13] "a4 quattro" "a4 quattro" "a4 quattro"
## [16] "a6 quattro" "a6 quattro" "a6 quattro"
## [19] "c1500 suburban 2wd" "c1500 suburban 2wd" "c1500 suburban 2wd"
## [22] "c1500 suburban 2wd" "c1500 suburban 2wd" "corvette"
## [25] "corvette" "corvette" "corvette"
## [28] "corvette" "k1500 tahoe 4wd" "k1500 tahoe 4wd"
## [31] "k1500 tahoe 4wd" "k1500 tahoe 4wd" "malibu"
## [34] "malibu" "malibu" "malibu"
## [37] "malibu" "caravan 2wd" "caravan 2wd"
## [40] "caravan 2wd" "caravan 2wd" "caravan 2wd"
## [43] "caravan 2wd" "caravan 2wd" "caravan 2wd"
## [46] "caravan 2wd" "caravan 2wd" "caravan 2wd"
## [49] "dakota pickup 4wd" "dakota pickup 4wd" "dakota pickup 4wd"
## [52] "dakota pickup 4wd" "dakota pickup 4wd" "dakota pickup 4wd"
## [55] "dakota pickup 4wd" "dakota pickup 4wd" "dakota pickup 4wd"
## [58] "durango 4wd" "durango 4wd" "durango 4wd"
## [61] "durango 4wd" "durango 4wd" "durango 4wd"
## [64] "durango 4wd" "ram 1500 pickup 4wd" "ram 1500 pickup 4wd"
## [67] "ram 1500 pickup 4wd" "ram 1500 pickup 4wd" "ram 1500 pickup 4wd"
## [70] "ram 1500 pickup 4wd" "ram 1500 pickup 4wd" "ram 1500 pickup 4wd"
## [73] "ram 1500 pickup 4wd" "ram 1500 pickup 4wd" "expedition 2wd"
## [76] "expedition 2wd" "expedition 2wd" "explorer 4wd"
## [79] "explorer 4wd" "explorer 4wd" "explorer 4wd"
## [82] "explorer 4wd" "explorer 4wd" "f150 pickup 4wd"
## [85] "f150 pickup 4wd" "f150 pickup 4wd" "f150 pickup 4wd"
## [88] "f150 pickup 4wd" "f150 pickup 4wd" "f150 pickup 4wd"
## [91] "mustang" "mustang" "mustang"
## [94] "mustang" "mustang" "mustang"
## [97] "mustang" "mustang" "mustang"
## [100] "civic" "civic" "civic"
## [103] "civic" "civic" "civic"
## [106] "civic" "civic" "civic"
## [109] "sonata" "sonata" "sonata"
## [112] "sonata" "sonata" "sonata"
## [115] "sonata" "tiburon" "tiburon"
## [118] "tiburon" "tiburon" "tiburon"
## [121] "tiburon" "tiburon" "grand cherokee 4wd"
## [124] "grand cherokee 4wd" "grand cherokee 4wd" "grand cherokee 4wd"
## [127] "grand cherokee 4wd" "grand cherokee 4wd" "grand cherokee 4wd"
## [130] "grand cherokee 4wd" "range rover" "range rover"
## [133] "range rover" "range rover" "navigator 2wd"
## [136] "navigator 2wd" "navigator 2wd" "mountaineer 4wd"
## [139] "mountaineer 4wd" "mountaineer 4wd" "mountaineer 4wd"
## [142] "altima" "altima" "altima"
## [145] "altima" "altima" "altima"
## [148] "maxima" "maxima" "maxima"
## [151] "pathfinder 4wd" "pathfinder 4wd" "pathfinder 4wd"
## [154] "pathfinder 4wd" "grand prix" "grand prix"
## [157] "grand prix" "grand prix" "grand prix"
## [160] "forester awd" "forester awd" "forester awd"
## [163] "forester awd" "forester awd" "forester awd"
## [166] "impreza awd" "impreza awd" "impreza awd"
## [169] "impreza awd" "impreza awd" "impreza awd"
## [172] "impreza awd" "impreza awd" "4runner 4wd"
## [175] "4runner 4wd" "4runner 4wd" "4runner 4wd"
## [178] "4runner 4wd" "4runner 4wd" "camry"
## [181] "camry" "camry" "camry"
## [184] "camry" "camry" "camry"
## [187] "camry solara" "camry solara" "camry solara"
## [190] "camry solara" "camry solara" "camry solara"
## [193] "camry solara" "corolla" "corolla"
## [196] "corolla" "corolla" "corolla"
## [199] "land cruiser wagon 4wd" "land cruiser wagon 4wd" "toyota tacoma 4wd"
## [202] "toyota tacoma 4wd" "toyota tacoma 4wd" "toyota tacoma 4wd"
## [205] "toyota tacoma 4wd" "toyota tacoma 4wd" "toyota tacoma 4wd"
## [208] "gti" "gti" "gti"
## [211] "gti" "gti" "jetta"
## [214] "jetta" "jetta" "jetta"
## [217] "jetta" "jetta" "jetta"
## [220] "jetta" "jetta" "new beetle"
## [223] "new beetle" "new beetle" "new beetle"
## [226] "new beetle" "new beetle" "passat"
## [229] "passat" "passat" "passat"
## [232] "passat" "passat" "passat"
Return all columns of the 2nd row
mpg[2,]
## manufacturer model displ year cyl trans drv cty hwy fl class
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
Return all columns in the first 4 rows
mpg[1:4,]
## manufacturer model displ year cyl trans drv cty hwy fl class
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
## 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
## 4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
Return all observations in the 1st, 3rd and 5th variable
mpg[, c(1,5,9)]
## manufacturer cyl hwy
## 1 audi 4 29
## 2 audi 4 29
## 3 audi 4 31
## 4 audi 4 30
## 5 audi 6 26
## 6 audi 6 26
## 7 audi 6 27
## 8 audi 4 26
## 9 audi 4 25
## 10 audi 4 28
## 11 audi 4 27
## 12 audi 6 25
## 13 audi 6 25
## 14 audi 6 25
## 15 audi 6 25
## 16 audi 6 24
## 17 audi 6 25
## 18 audi 8 23
## 19 chevrolet 8 20
## 20 chevrolet 8 15
## 21 chevrolet 8 20
## 22 chevrolet 8 17
## 23 chevrolet 8 17
## 24 chevrolet 8 26
## 25 chevrolet 8 23
## 26 chevrolet 8 26
## 27 chevrolet 8 25
## 28 chevrolet 8 24
## 29 chevrolet 8 19
## 30 chevrolet 8 14
## 31 chevrolet 8 15
## 32 chevrolet 8 17
## 33 chevrolet 4 27
## 34 chevrolet 4 30
## 35 chevrolet 6 26
## 36 chevrolet 6 29
## 37 chevrolet 6 26
## 38 dodge 4 24
## 39 dodge 6 24
## 40 dodge 6 22
## 41 dodge 6 22
## 42 dodge 6 24
## 43 dodge 6 24
## 44 dodge 6 17
## 45 dodge 6 22
## 46 dodge 6 21
## 47 dodge 6 23
## 48 dodge 6 23
## 49 dodge 6 19
## 50 dodge 6 18
## 51 dodge 6 17
## 52 dodge 6 17
## 53 dodge 8 19
## 54 dodge 8 19
## 55 dodge 8 12
## 56 dodge 8 17
## 57 dodge 8 15
## 58 dodge 6 17
## 59 dodge 8 17
## 60 dodge 8 12
## 61 dodge 8 17
## 62 dodge 8 16
## 63 dodge 8 18
## 64 dodge 8 15
## 65 dodge 8 16
## 66 dodge 8 12
## 67 dodge 8 17
## 68 dodge 8 17
## 69 dodge 8 16
## 70 dodge 8 12
## 71 dodge 8 15
## 72 dodge 8 16
## 73 dodge 8 17
## 74 dodge 8 15
## 75 ford 8 17
## 76 ford 8 17
## 77 ford 8 18
## 78 ford 6 17
## 79 ford 6 19
## 80 ford 6 17
## 81 ford 6 19
## 82 ford 8 19
## 83 ford 8 17
## 84 ford 6 17
## 85 ford 6 17
## 86 ford 8 16
## 87 ford 8 16
## 88 ford 8 17
## 89 ford 8 15
## 90 ford 8 17
## 91 ford 6 26
## 92 ford 6 25
## 93 ford 6 26
## 94 ford 6 24
## 95 ford 8 21
## 96 ford 8 22
## 97 ford 8 23
## 98 ford 8 22
## 99 ford 8 20
## 100 honda 4 33
## 101 honda 4 32
## 102 honda 4 32
## 103 honda 4 29
## 104 honda 4 32
## 105 honda 4 34
## 106 honda 4 36
## 107 honda 4 36
## 108 honda 4 29
## 109 hyundai 4 26
## 110 hyundai 4 27
## 111 hyundai 4 30
## 112 hyundai 4 31
## 113 hyundai 6 26
## 114 hyundai 6 26
## 115 hyundai 6 28
## 116 hyundai 4 26
## 117 hyundai 4 29
## 118 hyundai 4 28
## 119 hyundai 4 27
## 120 hyundai 6 24
## 121 hyundai 6 24
## 122 hyundai 6 24
## 123 jeep 6 22
## 124 jeep 6 19
## 125 jeep 6 20
## 126 jeep 8 17
## 127 jeep 8 12
## 128 jeep 8 19
## 129 jeep 8 18
## 130 jeep 8 14
## 131 land rover 8 15
## 132 land rover 8 18
## 133 land rover 8 18
## 134 land rover 8 15
## 135 lincoln 8 17
## 136 lincoln 8 16
## 137 lincoln 8 18
## 138 mercury 6 17
## 139 mercury 6 19
## 140 mercury 8 19
## 141 mercury 8 17
## 142 nissan 4 29
## 143 nissan 4 27
## 144 nissan 4 31
## 145 nissan 4 32
## 146 nissan 6 27
## 147 nissan 6 26
## 148 nissan 6 26
## 149 nissan 6 25
## 150 nissan 6 25
## 151 nissan 6 17
## 152 nissan 6 17
## 153 nissan 6 20
## 154 nissan 8 18
## 155 pontiac 6 26
## 156 pontiac 6 26
## 157 pontiac 6 27
## 158 pontiac 6 28
## 159 pontiac 8 25
## 160 subaru 4 25
## 161 subaru 4 24
## 162 subaru 4 27
## 163 subaru 4 25
## 164 subaru 4 26
## 165 subaru 4 23
## 166 subaru 4 26
## 167 subaru 4 26
## 168 subaru 4 26
## 169 subaru 4 26
## 170 subaru 4 25
## 171 subaru 4 27
## 172 subaru 4 25
## 173 subaru 4 27
## 174 toyota 4 20
## 175 toyota 4 20
## 176 toyota 6 19
## 177 toyota 6 17
## 178 toyota 6 20
## 179 toyota 8 17
## 180 toyota 4 29
## 181 toyota 4 27
## 182 toyota 4 31
## 183 toyota 4 31
## 184 toyota 6 26
## 185 toyota 6 26
## 186 toyota 6 28
## 187 toyota 4 27
## 188 toyota 4 29
## 189 toyota 4 31
## 190 toyota 4 31
## 191 toyota 6 26
## 192 toyota 6 26
## 193 toyota 6 27
## 194 toyota 4 30
## 195 toyota 4 33
## 196 toyota 4 35
## 197 toyota 4 37
## 198 toyota 4 35
## 199 toyota 8 15
## 200 toyota 8 18
## 201 toyota 4 20
## 202 toyota 4 20
## 203 toyota 4 22
## 204 toyota 6 17
## 205 toyota 6 19
## 206 toyota 6 18
## 207 toyota 6 20
## 208 volkswagen 4 29
## 209 volkswagen 4 26
## 210 volkswagen 4 29
## 211 volkswagen 4 29
## 212 volkswagen 6 24
## 213 volkswagen 4 44
## 214 volkswagen 4 29
## 215 volkswagen 4 26
## 216 volkswagen 4 29
## 217 volkswagen 4 29
## 218 volkswagen 5 29
## 219 volkswagen 5 29
## 220 volkswagen 6 23
## 221 volkswagen 6 24
## 222 volkswagen 4 44
## 223 volkswagen 4 41
## 224 volkswagen 4 29
## 225 volkswagen 4 26
## 226 volkswagen 5 28
## 227 volkswagen 5 29
## 228 volkswagen 4 29
## 229 volkswagen 4 29
## 230 volkswagen 4 28
## 231 volkswagen 4 29
## 232 volkswagen 6 26
## 233 volkswagen 6 26
## 234 volkswagen 6 26
Return all columns when model
equals camry
mpg[mpg$model=="camry",]
## manufacturer model displ year cyl trans drv cty hwy fl class
## 180 toyota camry 2.2 1999 4 manual(m5) f 21 29 r midsize
## 181 toyota camry 2.2 1999 4 auto(l4) f 21 27 r midsize
## 182 toyota camry 2.4 2008 4 manual(m5) f 21 31 r midsize
## 183 toyota camry 2.4 2008 4 auto(l5) f 21 31 r midsize
## 184 toyota camry 3.0 1999 6 auto(l4) f 18 26 r midsize
## 185 toyota camry 3.0 1999 6 manual(m5) f 18 26 r midsize
## 186 toyota camry 3.5 2008 6 auto(s6) f 19 28 r midsize
The above line reads: “in the dataframe mpg
, return all columns for the rows in which the model type is a camry.”
There are two important lessons from the above line
Since model
is a factor, it needs quotes.
Second, we need to use a double equal, ‘==’.
R reads =
as assigning a name to an object. R reads ==
as a logical function, essentially saying, “if it satisfies this condition.”
Return all variables with city mpg above 28
mpg[mpg$cty>30, ]
## manufacturer model displ year cyl trans drv cty hwy fl class
## 213 volkswagen jetta 1.9 1999 4 manual(m5) f 33 44 d compact
## 222 volkswagen new beetle 1.9 1999 4 manual(m5) f 35 44 d subcompact
Return all variables with city mpg greater than or equal to 28
mpg[mpg$cty >=28, ]
## manufacturer model displ year cyl trans drv cty hwy fl class
## 100 honda civic 1.6 1999 4 manual(m5) f 28 33 r subcompact
## 197 toyota corolla 1.8 2008 4 manual(m5) f 28 37 r compact
## 213 volkswagen jetta 1.9 1999 4 manual(m5) f 33 44 d compact
## 222 volkswagen new beetle 1.9 1999 4 manual(m5) f 35 44 d subcompact
## 223 volkswagen new beetle 1.9 1999 4 auto(l4) f 29 41 d subcompact
Create a new dataframe object of your indexing!
x <- mpg[mpg$cty>30, ]
x
## manufacturer model displ year cyl trans drv cty hwy fl class
## 213 volkswagen jetta 1.9 1999 4 manual(m5) f 33 44 d compact
## 222 volkswagen new beetle 1.9 1999 4 manual(m5) f 35 44 d subcompact
Load data (Set working directory)
Every time you use R studio, (beginning of a new session), you need to set the working directory This tells R where on your computer to import/ export data to/from.
There are two ways to set your R working directory:
(1) The GUI menu
(2) Use the command setwd()
My working directory: setwd("/Users/harrisonfried/Box Sync/Midwest CCA coupled networks/ActorForumAnalysis/Data")
To see your current working directory, use:
getwd()
## [1] "C:/Users/scagg/Documents/Shane's Projects/Networks/SENG/SENG Website/seng/content/meetings/2021-04-05-a-brief-introduction-to-r"
Now R knows where to get my data from and save what I produce!
Pull a data set in from your working directory using read.csv()
Install and load packages
Packages are a vital feature of R Most are created by people in your research community! Packages are modules with one or more functions (that were custom made by the package developers) Packages are what makes R powerful.
To use a package, you need to first INSTALL it, then LOAD it. You only have to install packages ONCE (it saves the package into your R studio memory) You must load them every time you restart R (so R knows what package(s) you will be drawing from today)
Simply install packages with the command:
#install.packages("statnet")
A common error is that people forget to put the name of the package in quotation marks.
statnet
is a macro-package for network analysis - it includes many smaller packages within it that each have a purpose in network analysis.
Now, we load the package because it is installed using the library()
function
library(statnet)
## Loading required package: tergm
## Loading required package: ergm
## Loading required package: network
##
## 'network' 1.17.1 (2021-06-12), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information
##
## 'ergm' 4.1.2 (2021-07-26), part of the Statnet Project
## * 'news(package="ergm")' for changes since last version
## * 'citation("ergm")' for citation information
## * 'https://statnet.org' for help, support, and other information
## 'ergm' 4 is a major update that introduces some backwards-incompatible
## changes. Please type 'news(package="ergm")' for a list of major
## changes.
## Loading required package: networkDynamic
##
## 'networkDynamic' 0.11.0 (2021-06-12), part of the Statnet Project
## * 'news(package="networkDynamic")' for changes since last version
## * 'citation("networkDynamic")' for citation information
## * 'https://statnet.org' for help, support, and other information
## Registered S3 method overwritten by 'tergm':
## method from
## simulate_formula.network ergm
##
## 'tergm' 4.0.2 (2021-07-28), part of the Statnet Project
## * 'news(package="tergm")' for changes since last version
## * 'citation("tergm")' for citation information
## * 'https://statnet.org' for help, support, and other information
##
## Attaching package: 'tergm'
## The following object is masked from 'package:ergm':
##
## snctrl
## Loading required package: ergm.count
##
## 'ergm.count' 4.0.2 (2021-06-18), part of the Statnet Project
## * 'news(package="ergm.count")' for changes since last version
## * 'citation("ergm.count")' for citation information
## * 'https://statnet.org' for help, support, and other information
## Loading required package: sna
## Loading required package: statnet.common
##
## Attaching package: 'statnet.common'
## The following object is masked from 'package:ergm':
##
## snctrl
## The following objects are masked from 'package:base':
##
## attr, order
## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## For citation information, type citation("sna").
## Type help(package="sna") to get started.
## Loading required package: tsna
##
## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
## * 'news(package="statnet")' for changes since last version
## * 'citation("statnet")' for citation information
## * 'https://statnet.org' for help, support, and other information
## unable to reach CRAN
Module 3: A Primer to Advanced Techniques
Creating Functions
You can make your own functions to carry out specific tasks.
function(x) {
#do something
}
## function(x) {
## #do something
## }
Here is an example. Suppose you have an equation and you want to give it different values of X to learn the output Y. This happens when you want to make predictions from a statistical model.
Load the iris
dataset, available to anyone with base R.
data("iris")
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
unique(iris$Species)
## [1] setosa versicolor virginica
## Levels: setosa versicolor virginica
Let’s plot the relationship between petal length and width
plot(iris$Petal.Width, iris$Petal.Length, col=iris$Species)
Next, let’s fit a linear model predicting petal length from width and species using lm()
fit1 <- lm(Petal.Length ~ Petal.Width + Species, data = iris)
summary(fit1)
##
## Call:
## lm(formula = Petal.Length ~ Petal.Width + Species, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02977 -0.22241 -0.01514 0.18180 1.17449
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.21140 0.06524 18.568 < 2e-16 ***
## Petal.Width 1.01871 0.15224 6.691 4.41e-10 ***
## Speciesversicolor 1.69779 0.18095 9.383 < 2e-16 ***
## Speciesvirginica 2.27669 0.28132 8.093 2.08e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3777 on 146 degrees of freedom
## Multiple R-squared: 0.9551, Adjusted R-squared: 0.9542
## F-statistic: 1036 on 3 and 146 DF, p-value: < 2.2e-16
So what is the equation for this model?
# PL = a + b1*Petal.Width + b2*versicolor + b3*virginica
Let’s see the intercept and slopes
coefs <- coef(fit1)
coefs
## (Intercept) Petal.Width Speciesversicolor Speciesvirginica
## 1.211397 1.018712 1.697791 2.276693
Now, let’s create a function that embeds the coefficients into a linear equation
fit1_predict <- function(PW, versicolor, virginica) {
Y <- coefs[1] + coefs[2]*PW + coefs[3]*versicolor + coefs[4]*virginica
print(Y)
}
Test the function with random values for each variable!
fit1_predict(PW=4, versicolor = 0, virginica = 1)
## (Intercept)
## 7.562937
Compute predictions for different species
preds <- fit1_predict(PW=seq(0,3,by=0.2), versicolor = 0, virginica = 0) # setosa
## [1] 1.211397 1.415139 1.618882 1.822624 2.026366 2.230109 2.433851 2.637593
## [9] 2.841336 3.045078 3.248820 3.452563 3.656305 3.860047 4.063789 4.267532
preds1 <- fit1_predict(PW=seq(0,3,by=0.2), versicolor = 1, virginica = 0) # versicolor
## [1] 2.909188 3.112931 3.316673 3.520415 3.724158 3.927900 4.131642 4.335385
## [9] 4.539127 4.742869 4.946612 5.150354 5.354096 5.557839 5.761581 5.965323
preds2 <- fit1_predict(PW=seq(0,3,by=0.2), versicolor = 0, virginica = 1) # virginica
## [1] 3.488090 3.691833 3.895575 4.099317 4.303060 4.506802 4.710544 4.914287
## [9] 5.118029 5.321771 5.525513 5.729256 5.932998 6.136740 6.340483 6.544225
# Start with a blank canvas
plot(NULL, xlim=c(0,3), ylim=c(1,7),
xlab="Petal.Width", ylab="Petal.Length")
# Add observed values
points(Petal.Length ~ Petal.Width, data = iris, col=Species)
# Add predictions
points(seq(0,3,by=0.2), preds, col=1, pch=19)
points(seq(0,3,by=0.2), preds1, col=2, pch=19)
points(seq(0,3,by=0.2), preds2, col=3, pch=19)
for Loops and Apply Statements
A for
loop will perform actions following a sequence or index, for example
x <- 1:10
x
## [1] 1 2 3 4 5 6 7 8 9 10
for(i in x) {
x2 <- i^2
print(x2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
## [1] 36
## [1] 49
## [1] 64
## [1] 81
## [1] 100
For example, for loops can be helpful when working with multiple networks
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:sna':
##
## betweenness, bonpow, closeness, components, degree, dyad.census,
## evcent, hierarchy, is.connected, neighborhood, triad.census
## The following objects are masked from 'package:network':
##
## %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
## get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
## is.directed, list.edge.attributes, list.vertex.attributes,
## set.edge.attribute, set.vertex.attribute
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
# simulate network
par(mfrow=c(1,1)) #Set the plotting dimensions
g <- barabasi.game(n=40, power = 1, directed = F)
plot(g, vertex.label=NA)
# First, let's use a for loop to create 4 networks with different powers
pow <- seq(0, 1, length.out=4)
pow
## [1] 0.0000000 0.3333333 0.6666667 1.0000000
# create a container for the networks
L <- list()
# loop through different values of pow and plot them
set.seed(1)
par(mfrow=c(2,2), mar=c(1,1,1,1))
for(i in seq_along(pow)) {
# simulate preferential attachment
L[[i]] <- barabasi.game(n=40, power = i, directed = F)
# plot each network
plot(L[[i]],
edge.arrow.size=0.2,
vertex.label=NA)
}
Now we can run some descriptive statistic on the list L
One way to do it is to use a loop.
for(i in seq_along(L)) {
dens <- graph.density(L[[i]])
apl <- average.path.length(L[[i]])
print(c(dens, apl))
}
## [1] 0.050000 4.515385
## [1] 0.050000 2.671795
## [1] 0.050000 2.179487
## [1] 0.050000 1.997436
Sometimes, the same goal can be accomplished more easily by using an apply statement (lapply
)
Apply statements take the following form: lapply(list, function)
lapply(L, graph.density)
## [[1]]
## [1] 0.05
##
## [[2]]
## [1] 0.05
##
## [[3]]
## [1] 0.05
##
## [[4]]
## [1] 0.05
# returns a list, so unlist() and put in data frame
DF <- data.frame(
Density = unlist(lapply(L, graph.density)),
APL = unlist(lapply(L, average.path.length))
)
DF
## Density APL
## 1 0.05 4.515385
## 2 0.05 2.671795
## 3 0.05 2.179487
## 4 0.05 1.997436
lapply()
is used for lists
There are also matrix and data.frame versions
For example, apply(array, margin, ...)
Or, apply(DF, 2, mean)
# this will apply the function mean()
to every column