name: title class: center, middle, inverse # Introduction to data visualisation with `ggplot2` Ben Matthews<sup>1</sup> and Eilidh Jack<sup>2</sup> 2020-06-29 (Last updated 2020-07-06) <sup>1</sup>University of Edinburgh <sup>2</sup>University of Glasgow <!-- --> (Jump to [Session Two](https://benmatthewsed.github.io/ui-data-visualization-course/#session-two)) --- # Welcome! - Who we are - [Course outline](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/course_outline.md) - [Code of conduct](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/code_of_conduct.md) - [Reference texts](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/course_outline.md#course-texts) - [`ggplot2` cheat sheet](https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) - this is __very__ important! - [Course questionnaire](https://edin.ac/3ecKThN) (thanks to everyone who has filled this in already!) --- # Learning outcomes By the end of the course you will: - Understand (some) basic principles of __data visualization__ - Be able to visualize data with `ggplot2`; scatterplots, bar charts, fitted lines - Understand (some) basic principles of __mapping__ - Be able to draw maps of spatial data with `ggplot2` --- name: outline # Course outline - [What is `ggplot2`?](https://benmatthewsed.github.io/ui-data-visualization-course/#what-is-ggplot) - [Concepts in the grammar of graphics](https://benmatthewsed.github.io/ui-data-visualization-course/#concepts-grammar-graphics) - [Types of geom](https://benmatthewsed.github.io/ui-data-visualization-course/#geom-types) - [Principles of data visualization](https://benmatthewsed.github.io/ui-data-visualization-course/#principles-data-viz) - [gg-gotchas](https://benmatthewsed.github.io/ui-data-visualization-course/#gggotchas) - [Mapping with `ggplot2`](https://benmatthewsed.github.io/ui-data-visualization-course/#ggplot-maps) - [`ggplot2` colour scales](https://benmatthewsed.github.io/ui-data-visualization-course/#ggplot-colour-scales) - [Titles](https://benmatthewsed.github.io/ui-data-visualization-course/#titles) - [Publication-ready plots](https://benmatthewsed.github.io/ui-data-visualization-course/#publication-plots) --- # Points of order - __Ask questions whenever__ 👍 You can do this through the Blackboard chat facility __{demo this now}__ - We've structured the sessions with regular breaks, but __if you need to leave just leave__! - Materials will live online ([slides](https://benmatthewsed.github.io/ui-data-visualization-course/), [other materials](https://github.com/benmatthewsed/ui-data-visualization-course)) so you can access them any time - __"You're not working from home, you're at home during a crisis trying to work"__ - We want this time to be as useful for you guys as possible, so __please let us know anything we can improve on__ between now and next week's session --- # Small-group programming - For the practical exercises we've divided you guys up into __groups of three__ - When we come to the practical sessions we'll split you out into Collaborate's __break-out rooms__ - Work on the exercises independently - If you have a question __please ask your group chat first__ - If your group can't resolve the issue then __please raise your hand__ and a moderator will join your group __{demo this now}__ - We'll __circulate through groups__ to see if everyone is getting on okay even if you don't have your hand up --- # What happens if I lose connectivity? - In our testing it seems like there can be __connectivity issues__ switching between breakout groups and the main session - __We'll check__ at the end of the breakout sessions to see if everyone has made it back in okay - Sometimes it helps to __close your browser__ and rejoin the session - If it all goes wrong and you can't rejoin __it's okay!__ - You can __access the materials any time__ (see links on previous page) - I think 'lectures' and live-coding is being recorded, but there's already an __excellent__ [series of videos on YouTube](https://youtu.be/t6IIJEoqPyk) covering how to use `ggplot2` --- # `R` set-up - We __very much recommend__ that you use a local installation of `R` and `RStudio` on your own computer - If you haven't got a working R and Rstudio set-up you can send Eilidh a message now in the group chat and she can help get you set-up - If this still doesn't work you can access a __cloud version__ of `R` with the course dataset and materials [here](https://rstudio.cloud/project/27200) - You can access the exercises [here](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) and the answers [here](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises-with-answers.R) --- # Who you are - Your __research interests__ - __Why__ this course? - What do you want to __achieve__? --- class: center, middle, inverse # Before we begin... --- # Why plot in the first place? - It's a key part of __any data analysis__ - Relying on __summary statistics can be misleading__: <img src="https://d2f99xq7vri1nk.cloudfront.net/DinoSequentialSmaller.gif" style="display: block; margin: auto;" /> The __datasaurus__ -- [Matejka and Fitzmaurice](https://www.autodeskresearch.com/publications/samestats) --- name: what-is-ggplot class: center, middle, inverse # What is `ggplot2`? --- # What is `ggplot2`? - It's an `R` package that draws graphs, developed by [Hadley Wickham](http://hadley.nz/) - From the [tidyverse website](https://ggplot2.tidyverse.org/): "`ggplot2` is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the __data__, tell `ggplot2` how to __map variables to aesthetics__, what __[visual marks to use__ to represent data points], and it takes care of the details." - ... and what's the Grammar of Graphics? Our `ggplot2` [cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf) says this is "the idea that you can __build every graph from the same components__: a data set, a coordinate system, and `geom`s—visual marks that represent data points." --- class: center, middle, inverse # Great, so why plot with `ggplot2`? --- class: center, middle, inverse # 1. Aids perception 💡 --- ## Perception - Learning `ggplot2` will help you __understand how data visualization works__, how to figure out what you want your plots to look like and describe this to the computer. - This is because it's __declarative__. You describe to the computer exactly what you want the plot to look like and it draws it for you (rather than, say, picking a type of plot from a menu). You can read a whole [blog post about](http://varianceexplained.org/r/why-I-use-ggplot2/) why this is a good thing! - So by using `ggplot2` you learn how to make __better__ graphs, not just how to make graphs. --- class: center, middle, inverse # 2. It's powerful 💪 --- ## It's Powerful - Because it's declarative it's also __flexible__ - You can make pretty pretty much __any type__ of (static) graph you want! (No fancy interactive web graphics though, sorry) - This is very powerful! You can make __publication__ (or Twitter) quality graphs pretty easily --- class: center, middle, inverse # 3. It's popular 😎 --- ## It's popular - There is a large community of users who use `ggplot2` to communicate with very different audiences - `ggplot2` is used by data journalists at major outlets [BBC](https://bbc.github.io/rcookbook/) and [Financial Times](https://johnburnmurdoch.github.io/slides/r-ggplot/#/) - As well as technical users who have [extended](https://exts.ggplot2.tidyverse.org/) `ggplot2`'s core functions for specific use cases - You can use the same tool to visualize data effectively for academic audiences as well as the general public (also just for ourselves), and because users often __post their code online__ you learn from these many users (for example, the online community around the [#TidyTuesday](https://github.com/rfordatascience/tidytuesday) weekly challenges) - Take advantage of [Ctrl + C & Ctrl + V](https://speakerdeck.com/hadley/should-all-statistics-students-be-programmers) --- class: center, middle, inverse # One more thing before we begin... --- ## `ggplot2` is powerful and fallible - `ggplot2` is a world unto itself (technically, it's a [domain specific language](http://adv-r.had.co.nz/dsl.html)) - It's also a world with [quirks](https://www.youtube.com/watch?v=vYwXMnC03I4) - .. so when learning how to use `ggplot2`, mistakes are gonna happen - even for existing `R` users - and that's __fine!__ 👍 - __Every error is an opportunity to learn something__ - we're all building up better mental models of how `ggplot2` works, and finding things that don't work as expected (or at all!) help us do this too 🎉 - We'll talk about __common 'gotchas'__ later on --- class: center, middle, inverse # Let's dive in: # `Live coding` --- class: center, middle, inverse # `Your turn!` # Lines __1-237__ in the [exercises](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) # Lines __1-179__ in the [answers](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises-with-answers.R) --- class: center, middle, inverse # Five-minute break --- name: concepts-grammar-graphics class: center, middle, inverse # Concepts in the grammar of graphics --- class: center, middle, inverse # __Dataset__ + # __mapping__ variables in the dataset to `aes`thetics + # __Layers__ of `geom`s (symbols to represent data) = # __Plot__ --- class: center, middle, inverse # __Dataset__ + # Mapping variables in the dataset to `aes`thetics + # Layers of `geom`s (symbols to represent data) = # Plot --- # Data - `ggplot2` expects your dataset to be [tidy](https://r4ds.had.co.nz/tidy-data.html): 1. Each __variable__ must have its own __column__. 2. Each __observation__ must have its own __row__. 3. Each __value__ must have its own __cell__. - From __[R for Data Science](https://r4ds.had.co.nz/tidy-data.html#fig:tidy-structure)__: <!-- --> - (Don't worry about this during the course - the datasets we're using have been tidied already) --- class: center, middle, inverse # Dataset + # __Mapping__ variables in the dataset to `aes`thetics + # Layers of `geom`s (symbols to represent data) = # Plot --- # Mapping variables to `aes()`thetics - __Which variables__ in your dataset you want to represent in the plot and __how__ - Most of the time you'll want at least `x` and `y` (these are __position__ `aes`thetics) - Includes __other visual encodings__ like: colour/fill, shape, line type, alpha (transparency), size... --- # An example call .pull-left[ ```r # mtcars is a dataset # that comes with R ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-1-1.png" width="504" /> ] - The call to `aes()` __lets `ggplot` ['see' into your dataset](https://ggplot2.tidyverse.org/reference/aes.html#quasiquotation)__ - If you ask `ggplot` to find a variable in your dataset but don't use `aes()` `ggplot` will __look in the wrong place__ - __Key point__ - if you want `ggplot` know about something __inside your dataset you have to reference it within__ `aes`! --- # Mapping helps thinking - This is part of what makes `ggplot` so helpful for __thinking__ about data visualization - you make the plot by choosing how to map variables to aesthetics and then telling `R` - This helps me decide __what to emphasise__ in my plots based on what I'm interested in; the same data mapped to different `aes()`thetics can have different emphasis (we'll pick up on this later in our [Principles of data visualization](https://benmatthewsed.github.io/ui-data-visualization-course/#principles-data-viz) session) - (Obviously you can do this with other plotting programs too, but I think because `ggplot` is __declarative__ the process is more explicit) --- class: center, middle, inverse # Dataset + # Mapping variables in the dataset to `aes`thetics + # __Layers__ of `geom`s (symbols to represent data) = # Plot --- # `geom`etric symbols to represent data - __What__ symbols you want `ggplot` to draw to represent the data; lines, points, errorbars, ribbons, contours... - You add these to the plot in __layers__ - You can have exactly the __same mapping__ of variables to `aes`thetics, but __different `geom`s__ --- .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-2-1.png" width="504" /> ] --- .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_line() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-3-1.png" width="504" /> ] --- # Layers and `geom`s - `ggplot2` builds plots out of __layers__ - This makes it suuuuuuper flexible - you can just __stack layers on top of each other__! - `geom`s only add __one layer at a time__ - __Add__ layers together with `+` __operator__ --- .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + geom_line() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-4-1.png" width="504" /> ] --- # `geom`s and `aes`thetics - __Every__ `geom` __needs__ a dataset and a mapping of variables to `aes`thetics - Layers (silently!) __inherit__ data and `aes` mappings that you set __for the whole plot__... - ... but you can set data and aesthetics for whole plot and for each layer __separately__, so you can override this default behaviour - This is often very helpful but also a big source of pain - __THERE BE DRAGONS__ 🐉 - Again, if you want a layer to reference something in your dataset you need to tell it with `aes` --- .pull-left[ ```r # cyl is another variable in mtcars ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_line() + geom_point( aes(colour = cyl) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-5-1.png" width="504" /> ] --- # Set `geom` parameters with `aes` or with a fixed value? - You use `geom`s to visually represent values of the __variables in your dataset__ - But sometimes you also want to change how a `geom` looks to a __fixed value__ (i.e. one that __doesn't__ come from a variable) in your dataset - like making all the points in a scatterplot red - If it's __within the call to `aes`__, `ggplot2` will look for it __in your dataset__ - If it's __outside the call to `aes`__, `ggplot2` will apply the __value you supply directly__ --- # Some notes on `aes()` - For technical reasons we won't go into now __quotation marks matter__ here (if you want you can read all about why [here](https://adv-r.hadley.nz/metaprogramming.html)) - This was a __common source of errors__ for me when learning `ggplot2` (and still is!) and my own mental model of when and why to use quote marks is pretty crude even now. So it's natural that this feels weird or arbitrary right now 👍 - Most of the time, if it's __inside__ the call to `aes` __don't__ use quotes --- .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_line( aes(colour = red) ) + geom_point() ``` ``` ## Error in FUN(X[[i]], ...): object 'red' not found ``` ] .pull-right[ ``` ## Error in FUN(X[[i]], ...): object 'red' not found ``` <img src="00-combined_slides_files/figure-html/unnamed-chunk-6-1.png" width="504" /> ] --- .pull-left[ ```r # not the quotation marks around "red" ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_line( colour = "red" ) + geom_point() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-7-1.png" width="504" /> ] --- class: center, middle, inverse # In review --- class: center, middle, inverse # __Dataset__ + # __Mapping__ variables in the dataset to `aes`thetics + # __Layers__ of `geom`s (symbols to represent data) = # __Plot__ --- class: center, middle, inverse # If you want `ggplot` to know about something in your dataset, use `aes()` --- class: center, middle, inverse # Any questions? --- name: geom-types class: center, middle, inverse # The three types of `geom` layer --- # Three types of `geom` - I like to think of `geom`s as being one of three types: - `geom`s that plot __each datum__ for you - `geom`s that plot __summaries__ of your dataset for you - `geom`s that plot __things that aren't in your dataset__ (like reference lines and such); often metadata - (NB The official `ggplot` book uses a [slightly different classification system](https://ggplot2-book.org/toolbox.html) for `geom`s, it's a pretty similar idea) - Different `geom`s require __different mappings of variables__ --- # Plot each datum please - Make a mark on the plot for __each row__ in the dataset (we saw this kind already) - - `geom_point` - `geom_line - `geom_area` - `geom_bar(stat = "identity)` - `geom_sf()` (we'll come back to this in the next session) - `geom_text()` - `geom_errorbar` (sort of) - These `geom`s usually require __mapping to `x` and `y` axes__ --- # Plot a summary of my data please - Plot something that __summarises__ all/groups of the data in my dataset - Univariate/distributions: - `geom_histogram` - `geom_density` - `geom_violin` - `geom_bar` (sort of) - Bivariate: - `geom_smooth` - `geom_contour` - `geom_boxplot` (with one continuous and one discrete variable) - You need __`x` and `y`__ mappings for __bivariate__ summaries, but __univariate__ summaries __only need `x`__ --- # A univariate example .pull-left[ ```r # note we only map x here - # because we're only drawing one variable ggplot( data = mtcars, mapping = aes(x = disp) ) + geom_density() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-8-1.png" width="504" /> ] --- # Bivariate example .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_smooth() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-9-1.png" width="504" /> ] --- # And we can add layers to this .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + geom_smooth() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-10-1.png" width="504" /> ] --- # Why stop there? .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + geom_smooth() + geom_density_2d() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-11-1.png" width="504" /> ] --- name: reference-lines # Plot something else on my graph - Reference lines - `geom_vline(xintercept = )` - `geom_hline(yintercept = )` - `geom_abline(slope = , intercept = )` - Annotations - `geom_text(aes(label = ))` (... sort of) - Reference lines need intercepts __specified outside of `aes()`__, but text labels need the specific `label` mapped to a variable in a dataset using `aes()` --- # Adding a reference line .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + geom_smooth() + geom_density_2d() + geom_vline(xintercept = 200) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-12-1.png" width="504" /> ] --- # Adding text .pull-left[ - It's kind of a lie to have `geom_text` here, because you use it to add values from your dataset using `aes()` (in this case, the values from the `y` axis): ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + geom_smooth() + geom_density_2d() + geom_vline(xintercept = 200) + geom_text( aes(label = drat) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-13-1.png" width="504" /> ] --- # Adding text .pull-left[ - But mostly I use it with a separate __dataframe of labels__ ```r label_df <- data.frame( x = 300, y = 4, text = "A great label" ) ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + geom_smooth() + geom_text( data = label_df, mapping = aes(x = x, y = 4, label = text) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-14-1.png" width="504" /> ] --- class: center, middle, inverse # Recap: # There are three types of `geom` # Different types need different `aes`thetic mappings --- class: center, middle, inverse # `Your turn!` # Lines __238-705__ in the [exercises](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) # Lines __180-603__ in the [answers](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises-with-answers.R) --- name: facets class: center, middle, inverse # Facets --- # ... facets? - From the [`ggplot2` book](https://ggplot2-book.org/getting-started.html#qplot-facetting): "Facetting creates __tables of graphics__ by splitting the data into subsets and displaying the __same graph for each subset__" - Just add a `facet_wrap()` layer to a `ggplot` call - Tell `ggplot` __which variable(s)__ to facet using a call to `vars()` (this acts like `aes()` but for `facet`s) - `ggplot` then makes a long string of plots - one for each level of the facetting variable - which it __wraps__ into a table - Why can't we just use `aes()` here too? Apparently `vars()` can [handle more complicated expressions than `aes()`](https://ggplot2.tidyverse.org/reference/vars.html), which lets you tweak your facets in some helpful ways. I (Ben) only learned this writing the course! --- # An example of facetting .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_line() + geom_point( aes(colour = cyl) ) + facet_wrap( facets = vars(cyl) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-15-1.png" width="504" /> ] --- # An example of facetting .pull-left[ You can facet by multiple variables ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_line() + geom_point( aes(colour = cyl) ) + facet_wrap( facets = vars(am, cyl) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-16-1.png" width="504" /> ] --- # Adjusting the scales .pull-left[ You can change how the __axes__ are displayed by setting `scales = `. This focuses on the trends __within__ the plot and makes it harder to see trends __between__ plots. ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_line() + geom_point( aes(colour = cyl) ) + facet_wrap( facets = vars(am, cyl), scales = "free_y" ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-17-1.png" width="504" /> ] --- # Going further with `facets` - The `ggplot` book has a [whole section](https://ggplot2-book.org/facet.html#facet-wrap) on facetting - You can allow facets to have __different scales__ using the `scales = ` argument. By default all the facets have the same scale, but you can allow each axis to vary by setting `scales = "free_y"`, `scales = "free_x"` or `scales = "free"` --- class: center, middle, inverse # `Your turn!` # Lines __706-796__ in the [exercises](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) # Lines __604-687__ in the [answers](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises-with-answers.R) --- class: center, middle, inverse # Five-minute break --- name: principles-data-viz class: center, middle, inverse # Principles of data visualization --- # Principles of data visualization - How we read graphs - Principles of chart design - An example --- # How we read graphs .pull-left[ - __Some visual comparisons are easier__ to make than others __Easiest__ 1. Position along a common scale 2. Position along identical, non-aligned scales 3. Length 4. Angle or slope 5. Area 6. Volume 6. Colour hue / saturation __Hardest__ (Source: Cleveland and McGill, 1984) ] .pull-right[ <img src="https://www.researchgate.net/profile/Chia_Shen/publication/221513994/figure/fig2/AS:305590816526337@1449869936222/Cleveland-and-McGills-elementary-perceptual-tasks-All-visual-representations-of.png" width="80%" style="display: block; margin: auto;" /> ] --- # How we read graphs (2) - __Proximity__: "objects or shapes that are close to one another appear to form groups... __place items to compare close together__, and less important comparisons further apart" (this is from a very interesting [review of research into how we understand graphs by Vanderplas, Cook and Hofmann](https://www.annualreviews.org/doi/full/10.1146/annurev-statistics-031219-041252)) - ... so think about how your __mapping of variables__ to aesthetics and facets __highlights some comparisons__ and not others <img src="https://www.annualreviews.org/na101/home/literatum/publisher/ar/journals/content/statistics/2020/statistics.2020.7.issue-1/annurev-statistics-031219-041252/20200226/images/medium/st070061.f11.gif" width="50%" style="display: block; margin: auto;" /> --- # Principles of chart design - __[Match perceptual and data topology](http://stat405.had.co.nz/lectures/20-effective-vis.pdf)__: Some `aes`thetic mappings have implicit __orders__ (like position and size), others are __unordered__ (like shape) - If you’re using a graphical element to show relative size of quantities __make larger size = larger quantity__ --- # Principles of chart design - From Hadley Wickham's [excellent slides on data-viz](http://stat405.had.co.nz/lectures/20-effective-vis.pdf): <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/page-0.gif" width="60%" style="display: block; margin: auto;" /> --- # Principles of chart design - From Hadley Wickham's [excellent slides on data-viz](http://stat405.had.co.nz/lectures/20-effective-vis.pdf): <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/page-1.gif" width="60%" style="display: block; margin: auto;" /> --- # Principles of chart design - From Hadley Wickham's [excellent slides on data-viz](http://stat405.had.co.nz/lectures/20-effective-vis.pdf): <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/page-2.gif" width="60%" style="display: block; margin: auto;" /> --- # Principles of chart design - From Hadley Wickham's [excellent slides on data-viz](http://stat405.had.co.nz/lectures/20-effective-vis.pdf): <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/page-3.gif" width="60%" style="display: block; margin: auto;" /> --- # Principles of chart design - From Hadley Wickham's [excellent slides on data-viz](http://stat405.had.co.nz/lectures/20-effective-vis.pdf): <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/page-4.gif" width="60%" style="display: block; margin: auto;" /> --- # Principles of chart design - From Hadley Wickham's [excellent slides on data-viz](http://stat405.had.co.nz/lectures/20-effective-vis.pdf): <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/page-5.gif" width="60%" style="display: block; margin: auto;" /> --- # Principles of chart design - From Hadley Wickham's [excellent slides on data-viz](http://stat405.had.co.nz/lectures/20-effective-vis.pdf): <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/page-6.gif" width="60%" style="display: block; margin: auto;" /> --- # Principles of chart design - From Hadley Wickham's [excellent slides on data-viz](http://stat405.had.co.nz/lectures/20-effective-vis.pdf): <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/page-7.gif" width="60%" style="display: block; margin: auto;" /> --- # Principles of chart design (2) - __Dual coding__: You can encode the same variables to multiple aesthetic parameters (easily in `ggplot2`) to aid comprehension - This can also aid __accessibility__ by reducing reliance on e.g. colour to distinguish variables .pull-left[ ```r # code am to colour ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = am) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-18-1.png" width="504" /> ] --- # Principles of chart design (2) - __Dual coding__: You can encode the same variables to multiple aesthetic parameters (easily in `ggplot2`) to aid comprehension - This can also aid __accessibility__ by reducing reliance on e.g. colour to distinguish variables .pull-left[ ```r # code am to shape as well # note - I had to cheat here and # ask ggplot2 to treat # am as a factor variable ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = am, shape = factor(am)) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-19-1.png" width="504" /> ] --- # Principles of chart design (3) - You can also colour scales designed to have __good perceptual properties__ for [different kinds of colour-blindness](https://serialmentor.com/dataviz/color-pitfalls.html#not-designing-for-color-vision-deficiency) (`ggplot2` defaults aren't so good for this) (we'll cover some of these later in the course) --- # Principles of chart design (4) - Plot the __raw data__ and the __summary__ - Remember the [datasaurus!](https://www.autodeskresearch.com/publications/samestats) <img src="https://d2f99xq7vri1nk.cloudfront.net/DinoSequentialSmaller.gif" style="display: block; margin: auto;" /> --- # Telling stories with charts - Think about how each plot fits into the [story you're telling](https://serialmentor.com/dataviz/telling-a-story.html) - It's totally fine to graph data __two or more times__ if it helps to tell the story (Cleveland 1985) - I think `ggplot` helps with this because it allows you to __iterate__ towards your final chart by building up layers --- # An example .left-column[ - Use the __grammar of graphics__ to describe this plot - In your groups, take __three minutes__ to discuss: - Which __attributes__ are emphasised most? - How could this figure be plotted differently or __improved__? ] .right-column[ <img src="https://raw.githubusercontent.com/benmatthewsed/ui-data-visualization-course/master/figures/thesis_figure.png" width="78%" style="display: block; margin: auto;" /> Age-crime curves for different crime types in Scottish Offenders Index 1989-2011, men (Source: [Ben's PhD thesis](https://era.ed.ac.uk/handle/1842/25810)) ] --- # How I'd describe this figure with the grammar of graphics Aspect of Grammar | Variable ------------------|------- `Data`set | Scottish Offenders Index `Aes`thetic: | X | Age Y | Conviction rate Line type | Year Colour | Crime type `Geom` | Line `Facet` | Crime type --- name: gggotchas class: center, middle, inverse # `gg`-gotchas --- # 1. "Sawtooth" lines .pull-left[ - [From R For Data Science](https://ggplot2-book.org/collective-geoms.html): <img src="https://ggplot2-book.org/collective-geoms_files/figure-html/oxboys-line-bad-1.png" width="85%" /> ] .pull-right[ - __Diagnosis__: ["This vis problem often means there is some grouping characteristic missing from the graphic"](https://www.njtierney.com/post/2020/06/14/jq-sawtooth/) - __Prescription__: Check the data for any more grouping variables you might've missed (e.g. if your dataset is values for multiple years and you haven't included year in the plot) ] --- ## `Error: Each group consists of only one observation. Do you need to adjust the group aesthetic?` - __Diagnosis__: This happens when you try to plot a __line__ using a factor as one of the position scales. `ggplot` then 'guesses' that you want the `x` variable to be the grouping variable - Data types, and particularly [factors](https://r4ds.had.co.nz/factors.html) matter to `ggplot`! - __Prescription__: Adjust the group aesthetic! Explicitly tell `ggplot` what variable you want to group by in the call to `aes()` using the `group = ` parameter --- ## `Error: stat_count() can only have an x or y aesthetic` - __Diagnosis__: This happens with summary `geom`s (usually `geom_bar`) which want to calculate a summary for you but you've already summarised the data outside of ggplot2 - __Prescription__: You can tell `geom_bar` that you've summarised the data already by [specifying `geom_bar(stat = "identity")`](https://stackoverflow.com/a/39679104/10791377) (here `identity` just means "use the numbers that I gave you please and don't calculate anything with them") --- ## `Error: Cannot use +.gg() with a single argument. Did you accidentally put + on a new line?` - __Diagnosis__: For [technical reasons I don't understand](https://community.rstudio.com/t/why-put-and-at-the-end-of-lines/1831) this isn't valid R code. So you can't do it! - __Prescription__: Move the `+` to the end of the previous line from the start of the new line --- ## `Error in FUN(X[[i]], ...) : object 'blah' not found` - __Diagnosis__: `ggplot` is looking for something in your dataset but there are __no variables with that name__ - See our discussion on concepts in the grammar of graphics about what goes inside the call to `aes` and what goes outside - __Prescription__: Make sure there aren't any __typos__, and that what you've specified __exists in your dataset__ - If you're trying to set a colour you might have to move this outside the call to `aes()` --- class: center, middle, inverse # `Your turn!` # Lines __797-980__ in the [exercises](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) # Lines __689-852__ in the [answers](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises-with-answers.R) --- class: center, middle, inverse # Recap --- # What we've learned: - How to draw graphs with `ggplot2` - Elements of the __grammar of graphics__ - Principles of __data visualization__ --- class: center, middle, inverse # Dataset + # Mapping variables in the dataset to `aes`thetics + # Layers of `geom`s (symbols to represent data) = # Plot --- class: center, middle, inverse # Three types of `geom`: plot each datum, plot a summary, plot metadata --- class: center, middle, inverse # Errors happen and that's fine! --- # In the next session - __Maps__ - Beautifying your ggplots with __themes__ and __titles__ - Answering any __questions__ you have between now and then - Please feel free to bring __your own data__! - Optional __homework__: try out this week's [#TidyTuesday](https://github.com/rfordatascience/tidytuesday) challenge --- name: session-two class: center, middle, inverse # Introduction to data visualisation with `ggplot2`: Session Two Ben Matthews<sup>1</sup> and Eilidh Jack<sup>2</sup> 2020-07-06 (Last updated 2020-07-06) <sup>1</sup>University of Edinburgh <sup>2</sup>University of Glasgow <!-- --> --- name: ggplot-maps class: center, middle, inverse # Maps in `ggplot2` --- # Maps 1. Maps are great (we take this as axiomatic!) 4. In `R`, maps are "different, and weird" -- [Danielle Navarro](https://djnavarro.net/post/maps/) 3. Maps can be misleading: use __with caution__! --- # Maps are 💯 - People love maps - "Maps are amongst the most compelling graphics, because the space they map is the space we live in, and maps may __show things we cannot see otherwise__" -- [Bivand, Pebesma and Gómez-Rubio, 2013, p.59](https://www.springer.com/gp/book/9781461476177) - "Maps are awesome." -- [Danielle Navarro](https://djnavarro.net/post/maps/) - But mapping raises lots of __tricky questions__... --- # What are we mapping? - If you're on this course we assume that you are: - Analysing data about __people__ - These people are __on Earth__, and may well be in the UK - So you'll [probably have __vector__ data](https://geocompr.robinlovelace.net/spatial-class.html#vector-data) - i.e. __points in space__ (and not [raster data](https://geocompr.robinlovelace.net/spatial-class.html#raster-data), which breaks up space into evenly sized grids, like in ecology or something) - And mostly you'll be plotting __areal__ data - which relates to some shape bounded in space (like a Local Authority, as in our dataset or neighbourhood), and data about these areal entities will likely be __aggregates__ describing the population within that area - e.g. votes, or recorded crimes, or percent unemployed... etc. --- # Okay, but points in what "space"? - Turns out that this is a really [__complicated__ question!](https://xkcd.com/977/) - We turn representations of the __3d surface__ of the Earth into a __2d map__ using a [coordinate reference system (CRS)](https://geocompr.robinlovelace.net/spatial-class.html#crs-intro) - This won't come up here 🤞 because we've set the CRS for you in our example dataset, but `ggplot2` needs to know the CRS of your data in order to draw the plot - If your map looks funny it may well be that you have the __wrong CRS__ --- class: center, middle, inverse # Before we begin, some potential pitfalls... --- # Maps can be misleading: Area or people? - __Choropleth__ maps - coloured in maps - are a good idea when you have high spatial resolution, but can be [misleading when you have low spatial resolution](https://serialmentor.com/dataviz/geospatial-data.html#choropleth-mapping) - This is because these maps can be [dominated by geographic units with __large areas__](https://serialmentor.com/dataviz/geospatial-data.html#choropleth-mapping) if the areas of your geographies are very different (this is very much the case for Scottish Local Authorities!) - ... and such areas often have low populations - This is not so bad when you're [mapping __densities__](https://serialmentor.com/dataviz/geospatial-data.html#choropleth-mapping) - where the denominator is the area of the spatial unit --- # Maps can be misleading: Who is at risk? - Mapping counts of things can be confounded by areas with the [highest population](https://socviz.co/maps.html#maps) - And also the by [Modifiable Areal Unit Problem](https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem) - "totals, rates, proportions, densities are influenced by both the __shape__ and scale of the __aggregation unit__" --- # Maps can be misleading: parameter estimates - This is __a bit involved__, but something to bear in mind if you're mapping spatial data with variable sample sizes in different regions: - "[P]lotting observed rates can have __serious drawbacks when sample sizes vary by area__, since very high (and low) observed rates are found disproportionately in poorly-sampled areas. Unfortunately, adjusting the observed rates to account for the effects of small-sample noise can introduce an opposite effect, in which the highest adjusted rates tend to be found disproportionately in well-sampled areas. __In either case, the maps can be difficult to interpret__ because the display of spatial variation in the underlying parameters of interest is confounded with spatial variation in sample sizes" -- [Gelman and Rice](http://www.stat.columbia.edu/~gelman/research/published/allmaps.pdf) - 😱 --- # Maps can be misleading: What shapes to draw? - We're going to focus on the simple case: __polygons__ that represent Local Authority boundaries - But you might want to change the shapes to adjust for e.g. population size as in a __cartogram__ - Or maybe you want to arrange facets (as in __facet_wrap__) in the shape of your spatial data as in a [__geofacet__](https://hafen.github.io/geofacet/) (see also Kieran Healy's thoughts on ['Is your data really spatial?'](https://socviz.co/maps.html#is-your-data-really-spatial)) --- class: center, middle, inverse # Maps are awesome... # but they can be misleading! --- class: center, middle, inverse # Intro to `sf` and what's in my __spatial__ dataframe? --- # Maps in `R`: simple features - Working with __spatial data__ in `R` is a whole sub-field in itself (see [here](https://geocompr.robinlovelace.net/ from a GIS/geoscience perspective) for a GIS introduction, and [here](https://socviz.co/maps.html#maps) for a social sciences introduction) - We use the [`sf` package](https://r-spatial.github.io/sf/articles/sf1.html) which implements __simple features__ - __Simple features__ can be thought of as "things" or objects that have a __spatial__ location or extent - We will introduce some of it's functionality by producing maps - "It’s a truly wonderful package, and it’s nicely documented too… it’s just that everything about geospatial data turns out to be more complicated than it looks so even though `{sf}` is really, really good, it’s still a huge pain to work with." [Danielle Navarro](https://djnavarro.net/post/maps/) --- # What's in my spatial dataframe? - To illustrate some simple mapping we will use the [ozmaps package](https://github.com/mdsumner/ozmaps/) which provides maps for Australian state boundaries etc. - The nice thing about working with `sf` data is that __each row__ of data corresponds to a __distinct geographical unit__ - The most important column is the __geometry__ variable - It contains spatial information on the __boundaries__ for each Australian state - Or for our data, the boundaries for each Local Authority in Scotland ```r options(width=70) #Import data set for Australian states oz_states <- ozmaps::ozmap_states glimpse(oz_states) ``` ``` ## Observations: 9 ## Variables: 2 ## $ NAME <chr> "New South Wales", "Victoria", "Queensland", "So... ## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((150.7016 -3..., MUL... ``` --- name: geom-types class: center, middle, inverse # Maps and coordinate systems --- # Drawing a simple map - Given data in this format we can add layers to a `ggplot2` object to draw a map - To do this we use `geom_sf()` .pull-left[ ```r ggplot( data = oz_states ) + geom_sf() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-22-1.png" width="504" /> ] --- # Drawing a simple map - Notice we didn't have to map our variables to any aesthetics (like we did last week) - This is unusual and specific to mapping with `ggplot2` - When we don't specify anything in `geom_sf()` it tries to map to a column called __`geometry`__ - The column we have which specifies our polygon information is called __`geometry`__ and so our map was created - However you can specify the mapping manually in the usual way .pull-left[ ```r ggplot( data = oz_states, mapping = aes(geometry = geometry) ) + geom_sf() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-23-1.png" width="504" /> ] --- # Choropleth maps - We can colour our map in using the fill aesthetic - Not very useful here! .pull-left[ ```r ggplot( data = oz_states ) + geom_sf(mapping=aes(fill=NAME)) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-24-1.png" width="504" /> ] --- #Layered maps - Just like we discussed last week we can add multiple `geom_sf()` layers to a plot - You will have the opportunity to add different types of layers to your maps when you work through the exercises (Some are more useful that others!) .pull-left[ ```r oz_votes <- ozmaps::abs_ced ggplot() + geom_sf( data = oz_states, mapping = aes(fill = NAME) ) + geom_sf(data = oz_votes, fill = NA) + coord_sf() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-25-1.png" width="504" /> ] --- #Coordinate systems - Setting coordinate systems is one of the most complicated aspects of `ggplot2` - The Cartesian coordinate system is the most common type of coordinate system (and the default in `ggplot2`) - Setting limits on the coordinate system will zoom in on the plot, rather than change the underlying data like setting limits on a scale - There is a nice blog post about coordinate systems [here](https://www.r-bloggers.com/coordinate-systems-in-ggplot2-easily-overlooked-and-rather-underrated/) --- name: geom-types class: center, middle, inverse # Setting coordinate limits --- #Setting coordinate limits with `{sf}` - We will use the `{sf}` package to set coordinate limits - Specifying `xlim` and `ylim` in `coord_sf()` will __zoom__ in on the specified region our plot .pull-left[ ```r ggplot( data = oz_votes ) + geom_sf() + coord_sf(xlim = c(150.97, 151.3), ylim = c(-33.98, -33.79)) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-26-1.png" width="504" /> ] --- #Setting coordinate limits with `{sf}` - We can set coordinate limits for normal dataframes (not `sf` objects) using `coord_cartesian()` .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) )+ geom_point() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-27-1.png" width="504" /> ] --- #Setting coordinate limits in general .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) )+ geom_point()+ coord_cartesian(xlim = c(300, 500), ylim = c(3, 4)) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-28-1.png" width="504" /> ] --- #Some other useful options to customise coordinate systems - `coord_fixed()` to specify a 1:1 mapping of x and y - `coord_flip()` swaps the x and y axis --- class: center, middle, inverse # `Your turn!` # Lines __985-1176__ in the [exercises](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) # Lines __858-1049__ in the [answers](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises-with-answers.R) --- name: ggplot-colour-scales class: center, middle, inverse # Colour scales --- # `ggplot`'s default colour scales - __Data types__ really matter for colour scales - Specifically, whether data are __continuous__ or __categorical__ --- # `ggplot`'s default continuous scale .pull-left[ ```r # continuous variable colour ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = qsec) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-29-1.png" width="504" /> ] --- # `ggplot`'s default discrete scale .pull-left[ ```r # discrete variable colour ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = factor(cyl)) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-30-1.png" width="504" /> ] --- # Other colour options: viridis - From their [website](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html#introduction): - __Colo[u]rful__, spanning as wide a palette as possible so as to make differences easy to see - __Perceptually uniform__, meaning that values close to each other have similar-appearing colours and values far away from each other have more different-appearing colours, consistently across the range of values, - __Robust to colourblindness__, so that the above properties hold true for people with common forms of colourblindness, as well as in grey scale printing - Has both __discrete__ and __continuous__ options: `scale_colour_viridis_d` and `scale_colour_viridis_c` respectively - See also the [`colorblindr` package](https://github.com/clauswilke/colorblindr) for more resources on making figures work better for people with different kinds of colourblindness --- # Continuous scale .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = qsec) ) + scale_colour_viridis_c() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-31-1.png" width="504" /> ] --- # Discrete scale .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = factor(cyl)) ) + scale_colour_viridis_d() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-32-1.png" width="504" /> ] --- # Different viridis palettes .pull-left[ - There are __four__ scales in viridis you can set with `option = `: `"plasma"`, `"magma"`, `"inferno"` and `"viridis"` (the default) ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = factor(cyl)) ) + scale_colour_viridis_d(option = "plasma") ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-33-1.png" width="504" /> ] --- # A common error .pull-left[ - If you use `scale_colour_viridis_d` with a continuous variable mapped to colour you'll get an error! ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = qsec) ) + scale_colour_viridis_d(option = "plasma") ``` ``` ## Error: Continuous value supplied to discrete scale ``` ] .pull-right[ ``` ## Error: Continuous value supplied to discrete scale ``` <img src="00-combined_slides_files/figure-html/unnamed-chunk-34-1.png" width="504" /> ] --- # Other options: greyscale .pull-left[ - Might be sensible if you want to print in __greyscale__ (although `viridis` palettes also have [good perceptual properties in greyscale](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html#introduction)) - Works well for __discrete__ variables... ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = factor(cyl)) ) + scale_colour_grey() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-35-1.png" width="504" /> ] --- # Other options: greyscale .pull-left[ - ... but doesn't for __continuous__ variables 😭 ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = cyl) ) + scale_colour_grey() ``` ``` ## Error: Continuous value supplied to discrete scale ``` ] .pull-right[ ``` ## Error: Continuous value supplied to discrete scale ``` <img src="00-combined_slides_files/figure-html/unnamed-chunk-36-1.png" width="504" /> ] --- # Other options: greyscale .pull-left[ - But we can __make our own__ with `scale_colour_gradient` if we really want to map use greyscale with a continuous variable ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = cyl) ) + scale_colour_gradient( low = "white", high = "black" ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-37-1.png" width="504" /> ] --- # Advanced - custom colours .pull-left[ - If you want to set colour to a single value, you can use `colour = "#123456"` using the colour [hex-code](https://www.color-hex.com/) or the [name of one of the colours `R` recognises](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf) - For example, the main colour in the [Understanding Inequalities](https://www.understanding-inequalities.ac.uk/) logo is `#009dc6`, so I often make plots like ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( colour = "#009dc6" ) + geom_smooth( colour = "#009dc6" ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-38-1.png" width="504" /> ] --- # Advanced - custom colours - You can __[make your own__ discrete colour scale](https://ggplot2.tidyverse.org/reference/scale_manual.html#examples) with `scale_colour_manual()` - If you want to be really fancy and come up with __your own colour scale__ see this [blog post](https://www.garrickadenbuie.com/blog/custom-discrete-color-scales-for-ggplot2/) about doing just that .pull-left[ - To use some other colours [complementary](https://www.sessions.edu/color-calculator-results/?colors=009dc6,c69e00,c6004f) to Understanding Inequalities blue ```r # hex codes have to be in a call to c() ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = factor(cyl)) ) + scale_colour_manual( values = c("#009dc6", "#c69e00", "#c6004f") ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-39-1.png" width="504" /> ] --- class: center, middle, inverse # `Your turn!` # Lines __1177-1302__ in the [exercises](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) # Lines __1052-1200__ in the [answers](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises-with-answers.R) --- name: titles class: center, middle, inverse # Titles --- # Adding titles with `labs()` - You can adjust __plot labels__, including titles, with `labs()`, using the `title`, `subtitle` and `captions` arguments .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + labs( title = "An informative title" ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-40-1.png" width="504" /> ] --- # Adding titles with `labs()` - You can adjust __plot labels__, including titles, with `labs()`, using the `title`, `subtitle` and `captions` arguments .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + labs( title = "An informative title", subtitle = "With even more information here" ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-41-1.png" width="504" /> ] --- # Adding titles with `labs()` - You can adjust __plot labels__, including titles, with `labs()`, using the `title`, `subtitle` and `captions` arguments .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + labs( title = "An informative title", subtitle = "With even more information here", caption = "With some extra information here" ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-42-1.png" width="504" /> ] --- # Adding changing plot labels with `labs()` - Can also rename your __axes__ here - As well as any __other `aes`thetic mappings__ like colour, fill, shape, size... .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = cyl) ) + labs( x = "Engine displacement", y = "Rear axle ratio", colour = "Number\nof cylinders" # \n means 'new line' ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-43-1.png" width="504" /> ] --- name: publication-plots class: center, middle, inverse # Publication-ready plots --- class: center, middle, inverse # Changing the theme # Tweaking elements of the theme --- # Changing the theme - You can change the `ggplot` theme just by adding a `theme_*` layer to the plot! - See the [full list of themes](https://ggplot2.tidyverse.org/reference/ggtheme.html#details) .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-44-1.png" width="504" /> ] --- # Changing the theme - You can change the `ggplot` theme just by adding a `theme_*` layer to the plot! - See the [full list of themes](https://ggplot2.tidyverse.org/reference/ggtheme.html#details) .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + theme_bw() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-45-1.png" width="504" /> ] --- # Changing the theme - You can change the `ggplot` theme just by adding a `theme_*` layer to the plot! - See the [full list of themes](https://ggplot2.tidyverse.org/reference/ggtheme.html#details) .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + theme_minimal() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-46-1.png" width="504" /> ] --- # Tweaking themes - As well as specifying a new theme for a plot with `theme_*` you can tweak specific elements of your plot with `theme()` - Importantly, if you want to change the theme with `theme_*` __AND__ adjust properties settings with `theme()` the call to `theme_*` __must come first!__ --- # `theme()` - There are [lots of theme components](https://ggplot2.tidyverse.org/reference/theme.html) that you can adjust - I forget what the settings are all the time! So don't worry if this is confusing 👍 - There are __some important ones__ we __won't__ talk about today, [like changing __text size__](https://ggplot2.tidyverse.org/reference/theme.html) - The ones I use the __most__ are... --- # Moving the legend .pull-left[ - For example, moving the legend from the right of the plot to the bottom with `legend.position = "bottom"` ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = disp) ) + theme( legend.position = "bottom" ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-47-1.png" width="504" /> ] --- # Rotating axis labels .pull-left[ - Rotating axis labels if you have overlapping `x`-labels with `axis.text.x = element_text(angle = 45, hjust = 1)` ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = disp) ) + theme( legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-48-1.png" width="504" /> ] --- # What about changing the theme now? .pull-left[ 🔥 🔥 🔥 ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = disp) ) + theme( legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1) ) + theme_minimal() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-49-1.png" width="504" /> ] --- # Order matters! .pull-left[ - The call to `theme_minimal` overrides the adjustments made in `theme`! - Need to call `theme_minimal` __first__ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = disp) ) + theme_minimal() + theme( legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-50-1.png" width="504" /> ] --- # Can I adjust `theme_minimal` directly? - __NO__ - `theme_*`s are [complete themes](https://ggplot2.tidyverse.org/reference/ggtheme.html) - you have to use `theme` to [modify theme components](https://ggplot2.tidyverse.org/reference/theme.html) .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point( mapping = aes(colour = disp) ) + theme_minimal( legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1) ) ``` ``` ## Error in theme_minimal(legend.position = "bottom", axis.text.x = element_text(angle = 45, : unused arguments (legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1)) ``` ] .pull-right[ ``` ## Error in theme_minimal(legend.position = "bottom", axis.text.x = element_text(angle = 45, : unused arguments (legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1)) ``` ] --- # Scales .pull-left[ - Sometimes you want to transform one (or both) of your axes if you have __skewed data__ - We can do this with `scale_*` functions - For example, changing the `x` axis to a __log__ scale ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-52-1.png" width="504" /> ] --- # Scales .pull-left[ - Sometimes you want to transform one (or both) of your axes if you have __skewed data__ - We can do this with `scale_*` functions - For example, changing the `x` axis to a __log__ scale ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + scale_x_log10() ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-53-1.png" width="504" /> ] --- # Formatting the scales .pull-left[ - You can use the `labels = ` option to format how the axis labels are displayed... - My most common uses are with __dates and times__, but there are [lots of different options](https://scales.r-lib.org/reference/index.html) - (NB we use `scales::label_scientific()` because it's the [recommended](https://github.com/r-lib/scales/) way to call a function from the `{scales}` package) ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + scale_x_log10( labels = scales::label_scientific() ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-54-1.png" width="504" /> ] --- # Formatting the scales - ... and the `breaks =` argument to set how many axis marks you want... .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + scale_x_log10( labels = scales::label_scientific(), breaks = c(100, 200, 300, 400, 500) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-55-1.png" width="504" /> ] --- # Formatting the scales - ... and the `limits =` argument to control the range of the axes .pull-left[ ```r ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() + scale_x_log10( labels = scales::label_scientific(), breaks = c(100, 200, 300, 400, 500), limits = c(10, 1000) ) ``` ] .pull-right[ <img src="00-combined_slides_files/figure-html/unnamed-chunk-56-1.png" width="504" /> ] --- # Saving the plot - Throughout the session we've just let plots __print__ out in the plots pane - Not so convenient if you want to __show someone else your work__! - We can use `ggsave()` to save a plot to disk - This needs the arguments `filepath`, to know __where__ to save your figure, and `plot` to know __what__ to save --- # Saving the plot .pull-left[ - First we should save the plot object by __assigning__ it to a name with `<-` - Then call `ggsave()` - For [technical reasons I don't understand](https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/) it's best to using the option `type = "cairo-png"` when saving your plot to make it look nice ```r disp_drat_plot <- ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() ggsave( filename = file.path("figures", "disp_drat_plot.png"), plot = disp_drat_plot, type = "cairo-png" ) ``` ] .pull-right[ <!-- --> ] --- # Saving the plot - We can change the size of the output file with `height = ` and `width = ` .pull-left[ ```r disp_drat_plot <- ggplot( data = mtcars, mapping = aes(x = disp, y = drat) ) + geom_point() ggsave( filename = file.path("figures", "disp_drat_plot_big.png"), plot = disp_drat_plot, type = "cairo-png", height = 2, width = 9 ) ``` ] .pull-right[ <!-- --> ] --- class: center, middle, inverse # `Your turn!` # Lines __1304-1497__ in the [exercises](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) # Lines __1202-1444__ in the [answers](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises-with-answers.R) --- class: center, middle, inverse # Bad plot challenge --- # Bad plot challenge 1. Make the __'worst'__ plot you can based on the SIMD data (you can decide what makes the plot bad) 2. Full instructions at __lines 1498-1514__ in the [exercises](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/script/exercises.R) 2. Email your finished plots to me (Ben) 3. I'll put them on the video feed with the screen sharing thing 4. When your plot is on the screen put a text description of how you made your bad plot/what you think makes your plot a bad plot in the group chat __OR__ 5. Make an actual sensible plot of your own data --- name: going-further class: center, middle, inverse # Going further with `ggplot2` --- # A more complex model of ggplot2 .pull-left[ - As we described it - __Dataset__ + - __mapping__ variables in the dataset to `aes`thetics + - __Layers__ of `geom`s (symbols to represent data) = - __Plot__ ] .pull-right[ - From the [`ggplot` book](https://ggplot2-book.org/mastery.html#components) - A default __dataset__ and set of __mappings__ from variables to aesthetics. - One or more __layers__, each composed of a __geometric__ object, a __statistical transformation__, a __position adjustment__, and optionally, a dataset and aesthetic mappings. - One __scale__ for each aesthetic mapping. - A __[coordinate](https://ggplot2-book.org/mastery.html#coordinate-systems) system__. - The __facetting__ specification - (Also [a __theme__](https://github.com/thomasp85/ggplot2_workshop/blob/master/presentation.pdf)) ] --- # Animations .pull-left[ - [`gganimate`](https://github.com/thomasp85/gganimate) lets you add __animations__ to `ggplot`s. It's amazing. - One of `gganimate`'s examples from the [package website](https://github.com/thomasp85/gganimate): ] .pull-right[ <img src="https://github.com/thomasp85/gganimate/raw/master/man/figures/README-unnamed-chunk-4-1.gif" style="display: block; margin: auto;" /> ] --- # Bespoke plots - __Relational__ (and therefore not-tidy) data like networks and things - [`ggraph`](https://github.com/thomasp85/ggraph) - __Combining__ multiple `ggplots` - [`patchwork`](https://patchwork.data-imaginist.com/) and [`cowplot`](https://wilkelab.org/cowplot/articles/introduction.html) - __Super-annotations__ - [`ggforce`](https://github.com/thomasp85/ggforce) and [`ggrepel`](https://github.com/slowkow/ggrepel); [arrows with `geom_curve`](https://bbc.github.io/rcookbook/#add_annotations) - __'Spatial'__ facets - [`geofacet`]( https://hafen.github.io/geofacet/articles/geofacet.html) - Turning __spatial__ polygons into other shapes shapes - [`geogrid`]( https://github.com/jbaileyh/geogrid) - The [`{leaflet}` package](https://cran.r-project.org/web/packages/leaflet/leaflet.pdf) in `R` allows you to create and customize __interactive maps__ - Look [here](https://rstudio.github.io/leaflet/) for a nice introduction - We have only given you a very brief introduction to mapping with `ggplot2` and `sf`, have a look at the [3-part tutorial by Moreno and Basille](https://www.r-spatial.org/r/2018/10/25/ggplot2-sf.html) for much more --- # Resources - See our list of [course texts](https://github.com/benmatthewsed/ui-data-visualization-course/blob/master/course_outline.md#course-texts) - Thomas Lin Pedersen's [`ggplot2` workshop materials]( https://github.com/thomasp85/ggplot2_workshop) go into __more depth__ than we have here about how the Grammar of Graphics is implemented in `ggplot2` - Keiran Healy's [Data Visualization: A practical introduction](https://socviz.co/) goes into more detail about visualization in the context of the __kinds of analysis we do as social scientists__, such as working with results from statistical models - For general resources on working with `R` see [this list](https://github.com/benmatthewsed/working-with-admin-data/blob/master/references-reading-list-html.md) from the [SCADR](https://www.scadr.ac.uk/) Working With Administrative Data course --- # In review .pull-left[ - We've covered the __basics__ of data visualization with `ggplot2`, including drawing some excellent maps - Hopefully we've convinced you that `ggplot2` is both __powerful__ and __flexible__ - And that while it might be confusing at first, __errors are all part of the learning process__. As Keiran Healy says, you have to be ["patient with `R`, and with yourself"](https://socviz.co/gettingstarted.html#be-patient-with-r-and-with-yourself) - __Go forth and plot!__ ] .pull-right[ <img src="https://cms.qz.com/wp-content/uploads/2017/06/hadleywickhamchart.jpg" height="60%" style="display: block; margin: auto;" /> <sup>(Picture by Garrett Grolemund and David Kahle)</sup> ] --- name: title class: center, middle, inverse .pull-left[ # Thank you! ] .pull-right[ Ben Matthews<sup>1</sup> and Eilidh Jack<sup>2</sup> 2020-06-29 (Last updated 2020-07-06) <sup>1</sup>University of Edinburgh <sup>2</sup>University of Glasgow Course [GitHub repository](https://github.com/benmatthewsed/ui-data-visualization-course) ] <!-- -->