Building Custom Data Visualizations
Introductions
Exploratory and Expository
Custom data visualizations can be categorized into two broad categories:
- Expository vs. exploarotry
Expository
- static dataset
- explore data for story
- communicate story to audience
Examples:
- New York Times
- The Pudding
- The Washington Post
- etc…
Exploratory
- dynamic dataset
- interview stakeholders
- build tool for stakeholders to explore the data
Examples
- scientific visualizations
- internel bussiness tools at Netflix, Uber, Airbnb, etc
Expository
- Data
- top 10 blockbusters every year for the last two decades
- Goal
- come up with a design and implement it together(Yay participation!)
- The raw data
- Data
Course Agenda
- Data exploration with Ovservables and Vega-Lite
- Design with Marks, Channels, and Gestalt Laws
- Code with SVG paths and D3 shapes, layouts
- Finish with annotations, axes, legends
Data
Exploring the Dataset
Data exploration
- list data attributes
- Ask questions
- Explore the data
Will use an Observable notebook for this
Data exploration
- How to use observables
To load external libraries: | d3 = require(‘d3’) |
---|---|
To write text: | md `` |
To set global variable: | globalVar = … |
To write a block of js code | {…} |
To create an SVG element | { const svg = DOM.svg(width, height) … return svg } |
- Data exploration
- Data types
- Categorical(movie genres)
- Ordinal(t-shirt sizes)
- Quantitatives(ratings/scores)
- Temporal(dates)
- Spatial(cities)
- Data types
Chart Types & Vega-Lite
Some basic chart types:
- Bar chart
- For categorical comparisons
- Domain: categorical
- Range: quantitative
- Histogram
- For categorical distributions
- Domain: quantitative bins
- Range: frequency of quantitative bin
- Scatterplot
- For correlation
- 2 categories, and the relationship between their quantitative values
- Line chart
- For temporal trends
- Domain: temporal
- Range: quantitative
- Bar chart
-
- A grammar of interactive graphics
- Data source
- Mark(tick, bar, point, line, etc)
- Encoding(x, y, color, etc)
- For each encoding: type, field
- Example Gallery
- A grammar of interactive graphics
Using Modules in Observable
Using Heatmap
Exploration Exercise
- Data exploration advice
- Check for missing data, and the validity of the data
- Focus on one question at a time(it’s very easy to get sidetracked with a tangent)
- If there IS an interesting tangent, make a note for later
- If the question leads on to a dead-end, explore another question or the tangent you found earlier
- Don’t be afraid to go out and look for additional data to aid your exploration
- Sometimes, no interesting pattern IS very interesting
Design
Marks & Channels
Translate From Data to Design
- Concentrate on the takeways to communicate across
- What does that mean in terms of the data?(Individual or aggregate elements? Which attributes?)
- Map the relevant data to visual elements
Design: Marks & Channels
- Map individual or aggregate data element to marks
- Map data attributes to channels
Desigh: Marks
- Points
- Lines
- Areas
Desigh: Channels
- Position
- Horizontal
- Vertical
- Both
- Color
- Shape
- Tilt
- Size
- Length
- Area
- Volume
- Position
Quantitative
- Position
- Size
- Color
Categorical
- Shape
- Texture
- Color
Temporal
- Animation
Design: Marks & Channels
- One-to-one mapping of data to channel
- Multiple mappings of channel to mark(x, y, size, color usually)
- Do not EVER map multiple data attributes to the same channel
Gestalt Laws of Grouping
Deisgn: Gestalt Laws of Grouping
- The human mind naturally groups individual elements into patterns
- Can use in data visualization to save processing time
Deisgn: Gestalt Laws of Grouping
- Proximity
- Put related objects near each other
- Similarity
- Indicate like objects(helpful if they can’t be placed close to each other)
- Enclosure
- Helpful when creating visualizations with multiple sections
- Proximity
Remix & Overlay
Design: Remix & Overlay
“You don’t always need to start from scratch, remix what’s out there already” - Nadieh Bremer
- But make sure they’re the right visuals to communicate your message
Code
Turning Designs into Code
Break it down! What do you need to draw the marks? What do you need to calculate the channels?
To draw marks: SVG(or canvas)
To calculate channels: D3 scales, shapes and layouts(or straight-up math!)
SVG elements
- rect
- circle
- text
- path
D3 Shapes and Layouts
D3核心API可分为以下几类
- Data preparation
- Layout calculation
- DOM manipulation
- Finishing touches
- Interactions
其中前两个可以将数据转换为需要绘制的SVG图形,并将结果使用第三个类别中的api绘制在页面上
Sometimes all you need are scales to get from data to screen space
- Scales
- Continuous
- Quantize
- Quantile
- Threshold
- Oridinal
- Scales
Often times, you may need specific layouts
- These output x/y positions
Code Exercise
The Finishing Touches
Readability
Readability
- Titles, descriptions and legends to explain the visualization
- Axes and annotations to describe the data
Exercise: Readability
- Add axes
- Add legends
- Add annotations
Adding Aesthetics
- More SVG for context & aesthetics
- Pattern
- Gradients
- Text on a path
- SVG filters(blurs, drop-shadows)
- Clipping & masking
Exercise: Adding Aesthetics
- Add textures for holiday
- Add dropshadow to movies
Interactions
- Interactions
- D3:
- hover, click and other simple interactions
- D3 & React(or similar)
- update after user manipulation of underlying data
- link multiple visualizations
- exploratory tools(filtering, aggregating)
- D3:
Wrapping Up
Exploratory Visualizations
Process(at Netflix with Susie & Elijah)
- Initial meeting with stakeholders to figure out most important questions
- Meeting with data engineers
- Mock-up in sketch, sandbox with Semiotics, see shape of data
- Prototype, iterate with stakeholders
Advice(from Netflix)
- Different data sources, often SQL queries -> plan for queries that take longer(important for interactions)
- Questions for stakeholders:
- What’s the business question they’re trying to answer?
- How do the metrics they’re comparing fall into a decesion?
Advice, cont.
- Level of aggregation that’s most effective for decision making:
- Get it to ~7 things that are granular and meaningful enough
- If not, top 10 of a default
- Gain trust and credibility within org
- Have to compete with tables of data(detailed but hard to read)
- People will get to a state you never designed for, so think through edge cases
- Level of aggregation that’s most effective for decision making:
Additional Resources
- Books:
- The Functional Art by Alberto Cairo
- Visual Analysis and Design by Tamare Munzner
- Online: