ggplot2 Quick Reference: Verbosity of Plot Specifications

The plot specifications we use in this quick reference are quite verbose. While ggplot2 provides approaches to allow for more concise specifications of plots, we do not use these shortcuts in this quick reference. We believe that our use of these shortcuts would slow down the understanding of the orthogonal features of the "grammar of graphics" underlying ggplot2. On a high level of abstraction, the "grammar of graphics" could be summarized as follows:

plot ::= coord scale+ facet? layer+
layer ::= data mapping stat geom position?

Our stylized (but verbose) way of specifying plots closely follows this structure. Here again is our running example:

ggplot() + 
coord_cartesian() +
scale_x_continuous() +
scale_y_continuous() +
scale_color_hue() +
facet_wrap(~cut) +
layer(
  data=diamonds, 
  mapping=aes(x=carat, y=price, color=color), 
  stat="identity", 
  stat_params=list(), 
  geom="point", 
  geom_params=list(), 
  position=position_jitter()
) +
layer(
  data=diamonds,
  mapping=aes(x=carat,y=price),
  stat="smooth",
  stat_params=list(method="glm", formula=y~poly(x,2)),
  geom="smooth",
  geom_params=list(color="black"),
  position=position_identity()
)

Replacing layer() with geom_...() or stat_...()

ggplot2 provides a multitude of functions that produce pre-configured layers. There are two kinds of such functions: those focused on specific geometric objects (geom_...()), and those focused on specific statistical summaries (stat_...()). Most geom functions create a layer with the given geom and use a stat that makes most sense for the given geom (e.g., geom_point(...) produces a layer(stat="identity", geom="point", ...)). Most stat functions create a layer with the given stat and use a geom that makes most sense for the given stat (e.g., stat_smooth(...) produces a layer(stat="smooth", geom="smooth", ...)).

ggplot() + 
coord_cartesian() +
scale_x_continuous() +
scale_y_continuous() +
scale_color_hue() +
facet_wrap(~cut) +
geom_point(
  data=diamonds, 
  mapping=aes(x=carat, y=price, color=color),
  stat_params=list(),
  geom_params=list(),
  position=position_jitter()
) +
stat_smooth(
  data=diamonds,
  mapping=aes(x=carat,y=price),
  stat_params=list(method="glm", formula=y~poly(x,2)),
  geom_params=list(color="black"),
  position=position_identity()
)

Defaults in Layer Configuration

The above example produces two layers. The first layer uses a "point" geom and an "identity" stat. ggplot2 automatically supplies sensible defaults for the geom parameters, stat parameters, and position adjustment based on the geom and/or stat used. In the above example, ggplot2 predefines default parameter lists for its geom and its stat, thus the stat_params and the geom_params arguments can be omitted. The second layer uses a "smooth" geom and a "smooth" stat. ggplot2 automatically supplies a default position adjustment, and thus the position argument is unnecessary here. Note that the default position adjustment for the point geom is position_identity(), but we want position_jitter(), thus we cannot omit the position argument in the layer produced by geom_point().

ggplot() + 
coord_cartesian() +
scale_x_continuous() +
scale_y_continuous() +
scale_color_hue() +
facet_wrap(~cut) +
geom_point(
  data=diamonds, 
  mapping=aes(x=carat, y=price, color=color),
  position=position_jitter()
) +
stat_smooth(
  data=diamonds,
  mapping=aes(x=carat,y=price),
  stat_params=list(method="glm", formula=y~poly(x,2)),
  geom_params=list(color="black")
)

Defaults in Plot Configuration

The use of certain geoms and stats and the use of continuous or discrete variables in the aesthetic mapping also affects the configuration of the plot itself. In the above example, the geoms and stats used in the two layers cause ggplot2 to use coord_cartesian() as the default coordinate system, scale_x_continuous() as the default x axis scale, scale_y_continuous as the default y axis scale, and scale_color_hue() as the default (discrete) color scale. Those parts of the plot specification can thus also be omitted.

ggplot() + 
facet_wrap(~cut) +
geom_point(
  data=diamonds, 
  mapping=aes(x=carat, y=price, color=color),
  position=position_jitter()
) +
stat_smooth(
  data=diamonds,
  mapping=aes(x=carat,y=price),
  stat_params=list(method="glm", formula=y~poly(x,2)),
  geom_params=list(color="black")
)

Pulling Data and Mapping into the Plot

In the above example, each layer defines which data frame (data argument) and which aesthetics mapping (mapping argument) it uses. Often the data frame is the same for each layer, and ggplot thus allows specifying it just once, in the call to ggplot(), which makes it available to all layers.

ggplot(data=diamonds) + 
facet_wrap(~cut) +
geom_point(
  mapping=aes(x=carat, y=price, color=color),
  position=position_jitter()
) +
stat_smooth(
  mapping=aes(x=carat,y=price),
  stat_params=list(method="glm", formula=y~poly(x,2)),
  geom_params=list(color="black")
)

The same applies to the aesthetic mapping. However, like in this example, the mapping often differs to some degree between layers (e.g., here the "point" layer uses the "color" aesthetic, but the "smooth" layer does not). In this case, it can make sense to define the aesthetic mapping used in the majority of layers in the ggplot function, and to override that "default" mapping in the other layers.

ggplot(data=diamonds, mapping=aes(x=carat, y=price)) + 
facet_wrap(~cut) +
geom_point(
  mapping=aes(x=carat, y=price, color=color),
  position=position_jitter()
) +
stat_smooth(
  stat_params=list(method="glm", formula=y~poly(x,2)),
  geom_params=list(color="black")
)