Building Custom Data Visualizations

Introductions

Exploratory and Expository

  • Custom data visualizations can be categorized into two broad categories:

    • Expository vs. exploarotry
  • Expository

    • static dataset
    • explore data for story
    • communicate story to audience
  • Examples:

    • New York Times
    • The Pudding
    • The Washington Post
    • etc…
  • Exploratory

    • dynamic dataset
    • interview stakeholders
    • build tool for stakeholders to explore the data
  • Examples

    • scientific visualizations
    • internel bussiness tools at Netflix, Uber, Airbnb, etc
  • Expository

    • Data
      • top 10 blockbusters every year for the last two decades
    • Goal
      • come up with a design and implement it together(Yay participation!)
    • The raw data

Course Agenda

Data

Exploring the Dataset

  • Data exploration

    1. list data attributes
    2. Ask questions
    3. Explore the data
  • Will use an Observable notebook for this

  • Data exploration

    • How to use observables
To load external libraries: d3 = require(‘d3’)
To write text: md ``
To set global variable: globalVar = …
To write a block of js code {…}
To create an SVG element { const svg = DOM.svg(width, height) … return svg }
  • Data exploration
    • Data types
      • Categorical(movie genres)
      • Ordinal(t-shirt sizes)
      • Quantitatives(ratings/scores)
      • Temporal(dates)
      • Spatial(cities)

Chart Types & Vega-Lite

  • Some basic chart types:

    • Bar chart
      • For categorical comparisons
      • Domain: categorical
      • Range: quantitative
    • Histogram
      • For categorical distributions
      • Domain: quantitative bins
      • Range: frequency of quantitative bin
    • Scatterplot
      • For correlation
      • 2 categories, and the relationship between their quantitative values
    • Line chart
      • For temporal trends
      • Domain: temporal
      • Range: quantitative
  • Vega-Lite

    • A grammar of interactive graphics
      • Data source
      • Mark(tick, bar, point, line, etc)
      • Encoding(x, y, color, etc)
      • For each encoding: type, field
    • Example Gallery
  • solution 1

  • solution 2

Using Modules in Observable

Using Heatmap

Exploration Exercise

full notebook

  • Data exploration advice
    • Check for missing data, and the validity of the data
    • Focus on one question at a time(it’s very easy to get sidetracked with a tangent)
    • If there IS an interesting tangent, make a note for later
    • If the question leads on to a dead-end, explore another question or the tangent you found earlier
    • Don’t be afraid to go out and look for additional data to aid your exploration
    • Sometimes, no interesting pattern IS very interesting

Design

Marks & Channels

  • The ProPublica GitHub

  • Translate From Data to Design

    1. Concentrate on the takeways to communicate across
    2. What does that mean in terms of the data?(Individual or aggregate elements? Which attributes?)
    3. Map the relevant data to visual elements
  • Design: Marks & Channels

    • Map individual or aggregate data element to marks
    • Map data attributes to channels
  • Desigh: Marks

    • Points
    • Lines
    • Areas
  • Desigh: Channels

    • Position
      • Horizontal
      • Vertical
      • Both
    • Color
    • Shape
    • Tilt
    • Size
      • Length
      • Area
      • Volume
  • Quantitative

    • Position
    • Size
    • Color
  • Categorical

  • Temporal

    • Animation
  • slide

  • Design: Marks & Channels

    • One-to-one mapping of data to channel
    • Multiple mappings of channel to mark(x, y, size, color usually)
    • Do not EVER map multiple data attributes to the same channel

Gestalt Laws of Grouping

  • Deisgn: Gestalt Laws of Grouping

    • The human mind naturally groups individual elements into patterns
    • Can use in data visualization to save processing time
  • Deisgn: Gestalt Laws of Grouping

    • Proximity
      • Put related objects near each other
    • Similarity
      • Indicate like objects(helpful if they can’t be placed close to each other)
    • Enclosure
      • Helpful when creating visualizations with multiple sections

Remix & Overlay

Code

Turning Designs into Code

  • Break it down! What do you need to draw the marks? What do you need to calculate the channels?

  • To draw marks: SVG(or canvas)

  • To calculate channels: D3 scales, shapes and layouts(or straight-up math!)

  • SVG elements

    • rect
    • circle
    • text
    • path

D3 Shapes and Layouts

  • current slide

  • D3核心API可分为以下几类

    • Data preparation
    • Layout calculation
    • DOM manipulation
    • Finishing touches
    • Interactions
  • 其中前两个可以将数据转换为需要绘制的SVG图形,并将结果使用第三个类别中的api绘制在页面上

  • Sometimes all you need are scales to get from data to screen space

    • Scales
      • Continuous
      • Quantize
      • Quantile
      • Threshold
      • Oridinal
  • Often times, you may need specific layouts

    • These output x/y positions
  • Who’s speaking in Middle Earth

Code Exercise

The Finishing Touches

Readability

Exercise: Readability

  1. Add axes
  2. Add legends
  3. Add annotations

Adding Aesthetics

Exercise: Adding Aesthetics

  1. Add textures for holiday
  2. Add dropshadow to movies

Interactions

  • Interactions
    • D3:
      • hover, click and other simple interactions
    • D3 & React(or similar)
      • update after user manipulation of underlying data
      • link multiple visualizations
      • exploratory tools(filtering, aggregating)

Wrapping Up

Exploratory Visualizations

  • Process(at Netflix with Susie & Elijah)

    1. Initial meeting with stakeholders to figure out most important questions
    2. Meeting with data engineers
    3. Mock-up in sketch, sandbox with Semiotics, see shape of data
    4. Prototype, iterate with stakeholders
  • Advice(from Netflix)

    • Different data sources, often SQL queries -> plan for queries that take longer(important for interactions)
    • Questions for stakeholders:
      • What’s the business question they’re trying to answer?
      • How do the metrics they’re comparing fall into a decesion?
  • Advice, cont.

    • Level of aggregation that’s most effective for decision making:
      • Get it to ~7 things that are granular and meaningful enough
      • If not, top 10 of a default
    • Gain trust and credibility within org
      • Have to compete with tables of data(detailed but hard to read)
      • People will get to a state you never designed for, so think through edge cases

Additional Resources