Announcement

R basics

styles

(reading assignment)

Checkout Style guide in Advanced R and the tidyverse style guide.

Arithmetic

R can do any basic mathematical computations.

symbol use
+ addition
- subtraction
* multiplication
/ division
^ power
%% modulus
exp() exponent
log() natural logarithm
sqrt() square root
round() rounding
floor() flooring
ceiling() ceiling

Objects

You can create an R object to save results of a computation or other command.

Example 1

x <- 3 + 5
x
## [1] 8
  • In most languages, the direction of passing through the value into the object goes from right to left (e.g. with “=”). However, R allows both directions (which is actually bad!). In this course, we encourage the use of “<-” or “=”. There are people liking “=” over “<-” for the reason that “<-” sometimes break into two operators “< -”.

Example 2

x < - 3 + 5
## [1] FALSE
x
## [1] 8
  • For naming conventions, stick with either “.” or “_” (refer to the style guide).

Example 3

sum.result <- x + 5
sum.result
## [1] 13
  • important: many names are already taken for built-in R functions. Make sure that you don’t override them.

Example 4

sum(2:5)
## [1] 14
sum
## function (..., na.rm = FALSE)  .Primitive("sum")
sum <- 3 + 4 + 5
sum(5:8)
## [1] 26
sum
## [1] 12
  • R is case-sensitive. “Math.7260” is different from “math.7260”.

Locating and deleting objects:

The commands “objects()” and “ls()” will provide a list of every object that you’ve created in a session.

objects()
## [1] "sum"        "sum.result" "x"
ls()
## [1] "sum"        "sum.result" "x"

The “rm()” and “remove()” commands let you delete objects (tip: always clearn-up your workspace as the first command)

rm(list=ls())  # clean up workspace

Vectors

Many commands in R generate a vector of output, rather than a single number.

The “c()” command: creates a vector containing a list of specific elements.

Example 1

c(7, 3, 6, 0)
## [1] 7 3 6 0
c(73:60)
##  [1] 73 72 71 70 69 68 67 66 65 64 63 62 61 60
c(7:3, 6:0)
##  [1] 7 6 5 4 3 6 5 4 3 2 1 0
c(rep(7:3, 6), 0)
##  [1] 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 0

Example 2 The command “seq()” creates a sequence of numbers.

seq(7)
## [1] 1 2 3 4 5 6 7
seq(3, 70, by = 6)
##  [1]  3  9 15 21 27 33 39 45 51 57 63 69
seq(3, 70, length = 6)
## [1]  3.0 16.4 29.8 43.2 56.6 70.0

Operations on vectors

Use brackets to select element of a vector.

x <- 73:60
x[2]
## [1] 72
x[2:5]
## [1] 72 71 70 69
x[-(2:5)]
##  [1] 73 68 67 66 65 64 63 62 61 60

Can access by “name” (safe with column/row order changes)

y <- 1:3
names(y) <- c("do", "re", "mi")
y[3]
## mi 
##  3
y["mi"]
## mi 
##  3

R commands on vectors

command usage
sum() sum over elements in vector
mean() compute average value
sort() sort elements in a vector
min(), max() min and max values of a vector
length() length of a vector
summary() returns the min, Q1, median, mean, Q3, and max values of a vector
sample(x, size, replace = FALSE, prob = NULL) takes a random sample from a vector with or without replacement

Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.

Matrix

matrix() command creates a matrix from the given set of values

matrix.example <- matrix(rnorm(100), nrow = 10, ncol = 10, byrow = TRUE)
matrix.example
##             [,1]       [,2]       [,3]        [,4]        [,5]        [,6]
##  [1,]  1.6609522 -3.2239324  0.3780545  0.02224778  0.41802598  0.04315235
##  [2,] -0.1556653 -2.2072676 -1.0056647  2.12559032 -0.23741994 -0.30185899
##  [3,]  0.6007446 -1.2261235 -0.6435033  0.76563321 -0.74859040 -2.12464086
##  [4,] -1.7989506  0.5701902  1.5050495 -0.01187399  1.77322757 -0.83791182
##  [5,] -1.8399588 -0.1131396 -0.5405625 -0.98972522  1.20879255 -1.36467309
##  [6,] -0.3536617  1.0709056 -0.3375632 -0.79290930  0.32520094 -0.19650941
##  [7,] -1.2783317 -0.2032028 -1.1860343  1.58787365 -0.45398664  0.36405477
##  [8,] -0.7298772 -0.5298363 -0.5035697  0.66552758 -0.24691264 -0.36750550
##  [9,] -0.1844880 -0.4658350 -0.2418788 -2.73817798  0.04685862 -0.21934404
## [10,] -1.9268563  0.7099139 -1.1834621  0.52066625  0.41048728 -0.02298710
##               [,7]        [,8]       [,9]       [,10]
##  [1,] -0.427646900  1.43358204  0.4650274 -0.58651869
##  [2,] -0.156628903 -2.28286290 -1.1970822  0.27048413
##  [3,]  0.009215484 -0.37625399 -0.1539379  1.78747269
##  [4,] -0.696794551  1.26632391 -1.6119336 -0.71047222
##  [5,]  1.345252962 -0.90800699  0.7071567 -0.12457697
##  [6,]  0.583183290  0.10639700 -0.9456968 -0.06218972
##  [7,] -0.555968237  0.37774834 -0.9896697 -0.16485577
##  [8,]  0.976964258 -0.09326077  1.0992434 -0.65471893
##  [9,]  0.267991249 -1.26801952  1.0006473  0.44921698
## [10,] -0.427189492  0.86624497 -0.7293580 -0.01644784

R commands on vector/matrix

command usage
sum() sum over elements in vector/matrix
mean() compute average value
sort() sort all elements in a vector/matrix
min(), max() min and max values of a vector/matrix
length() length of a vector/matrix
summary() returns the min, Q1, median, mean, Q3, and max values of a vector
dim() dimension of a matrix
cbind() combine a sequence of vector, matrix or data-frame arguments and combine by columns
rbind() combine a sequence of vector, matrix or data-frame arguments and combine by rows
names() get or set names of an object
colnames() get or set column names of a matrix-like object
rownames() get or set row names of a matrix-like object
sum(matrix.example)
## [1] -18.88918
mean(matrix.example)
## [1] -0.1888918
sort(matrix.example)
##   [1] -3.223932370 -2.738177980 -2.282862904 -2.207267580 -2.124640864
##   [6] -1.926856258 -1.839958796 -1.798950586 -1.611933554 -1.364673094
##  [11] -1.278331669 -1.268019520 -1.226123454 -1.197082195 -1.186034347
##  [16] -1.183462071 -1.005664685 -0.989725222 -0.989669687 -0.945696839
##  [21] -0.908006993 -0.837911822 -0.792909305 -0.748590397 -0.729877171
##  [26] -0.729358010 -0.710472219 -0.696794551 -0.654718929 -0.643503286
##  [31] -0.586518693 -0.555968237 -0.540562488 -0.529836252 -0.503569698
##  [36] -0.465834973 -0.453986642 -0.427646900 -0.427189492 -0.376253993
##  [41] -0.367505497 -0.353661693 -0.337563228 -0.301858987 -0.246912641
##  [46] -0.241878798 -0.237419944 -0.219344037 -0.203202756 -0.196509406
##  [51] -0.184488031 -0.164855765 -0.156628903 -0.155665257 -0.153937938
##  [56] -0.124576974 -0.113139570 -0.093260772 -0.062189724 -0.022987105
##  [61] -0.016447841 -0.011873986  0.009215484  0.022247780  0.043152346
##  [66]  0.046858624  0.106396996  0.267991249  0.270484135  0.325200943
##  [71]  0.364054769  0.377748338  0.378054522  0.410487276  0.418025980
##  [76]  0.449216978  0.465027399  0.520666252  0.570190211  0.583183290
##  [81]  0.600744588  0.665527576  0.707156692  0.709913943  0.765633212
##  [86]  0.866244968  0.976964258  1.000647275  1.070905631  1.099243429
##  [91]  1.208792549  1.266323912  1.345252962  1.433582040  1.505049470
##  [96]  1.587873655  1.660952179  1.773227569  1.787472693  2.125590319
summary(matrix.example)
##        V1                V2                V3                V4         
##  Min.   :-1.9269   Min.   :-3.2239   Min.   :-1.1860   Min.   :-2.7382  
##  1st Qu.:-1.6688   1st Qu.:-1.0521   1st Qu.:-0.9151   1st Qu.:-0.5977  
##  Median :-0.5418   Median :-0.3345   Median :-0.5221   Median : 0.2715  
##  Mean   :-0.6006   Mean   :-0.5618   Mean   :-0.3759   Mean   : 0.1155  
##  3rd Qu.:-0.1629   3rd Qu.: 0.3994   3rd Qu.:-0.2658   3rd Qu.: 0.7406  
##  Max.   : 1.6610   Max.   : 1.0709   Max.   : 1.5050   Max.   : 2.1256  
##        V5                V6                 V7                 V8           
##  Min.   :-0.7486   Min.   :-2.12464   Min.   :-0.69679   Min.   :-2.282863  
##  1st Qu.:-0.2445   1st Qu.:-0.72031   1st Qu.:-0.42753   1st Qu.:-0.775069  
##  Median : 0.1860   Median :-0.26060   Median :-0.07371   Median : 0.006568  
##  Mean   : 0.2496   Mean   :-0.50282   Mean   : 0.09184   Mean   :-0.087811  
##  3rd Qu.: 0.4161   3rd Qu.:-0.06637   3rd Qu.: 0.50439   3rd Qu.: 0.744121  
##  Max.   : 1.7732   Max.   : 0.36405   Max.   : 1.34525   Max.   : 1.433582  
##        V9               V10          
##  Min.   :-1.6119   Min.   :-0.71047  
##  1st Qu.:-0.9787   1st Qu.:-0.48110  
##  Median :-0.4416   Median :-0.09338  
##  Mean   :-0.2356   Mean   : 0.01874  
##  3rd Qu.: 0.6466   3rd Qu.: 0.19875  
##  Max.   : 1.0992   Max.   : 1.78747

Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.

Comparison (logic operator)

symbol use
!= not equal
== equal
> greater
>= greater or equal
< smaller
<= smaller or equal
is.na is it “Not Available”/Missing
complete.cases returns a logical vector specifying which observations/rows have no missing values
is.finite if the value is finite
all are all values in a logical vector true?
any any value in a logical vector is true?
test.vec <- 73:68
test.vec
## [1] 73 72 71 70 69 68
test.vec < 70
## [1] FALSE FALSE FALSE FALSE  TRUE  TRUE
test.vec > 70
## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
test.vec[3] <- NA
test.vec
## [1] 73 72 NA 70 69 68
is.na(test.vec)
## [1] FALSE FALSE  TRUE FALSE FALSE FALSE
complete.cases(test.vec)
## [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
all(is.na(test.vec))
## [1] FALSE
any(is.na(test.vec))
## [1] TRUE

Now let’s do a test of accuracy for doubles in R. Recall that for Double precision, we get approximately \(\log_{10}(2^{52}) \approx 16\) decimal point for precision.

test.exponent <- -(7:18)
10^test.exponent == 0
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1 - 10^test.exponent == 1
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
7360 - 10^test.exponent == 7360
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
73600 - 10^test.exponent == 73600
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Other operators

%in%, match

test.vec
## [1] 73 72 NA 70 69 68
66 %in% test.vec
## [1] FALSE
match(66, test.vec, nomatch = 0)
## [1] 0
70 %in% test.vec
## [1] TRUE
match(70, test.vec, nomatch = 0)
## [1] 4
match(70, test.vec, nomatch = 0) > 0 # the implementation of %in%
## [1] TRUE

Control flow

These are the basic control-flow constructs of the R language. They function in much the same way as control statements in any Algol-like (Algol short for “Algorithmic Language”) language. They are all reserved words.

keyword usage
if if(cond) expr
if-else if(cond) cons.expr else alt.expr
for for(var in seq) expr
while while(cond) expr
break breaks out of a for loop
next halts the processing of the current iteration and advances the looping index

Define a function

Read Function section from Advanced R by Hadley Wickham. We will visit functions in more details.

DoNothing <- function() {
  return(invisible(NULL))
}
DoNothing()

In general, try to avoid using loops (vectorize your code) in R. If you have to loop, try using for loops first. Sometimes, while loops can be dangerous (however, a smart compiler should detect this).

DoBadThing <- function() {
  result <- NULL
  while(TRUE) {
    result <- c(result, rnorm(100))
  }
  return(result)
}
# DoBadThing()

Install packages

You can install R packages from several places (reference):

  • Comprehensive R Archive Network (CRAN)

    • Official R packages repository

    • Some levels of code checks (cross platform support, version control etc)

    • Most common place you will install packages

    • Pick a mirror location near you

    • install.packages("packge_name")

  • GitHub

    • May get development version of a package

    • Almost zero level of code checks

    • Common place to develop a package before submitting to CRAN

      install.packages("devtools")
      library(devtools)
      install_github("tidyverse/ggplot2")

Load packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
require(tidyverse)