Drawing Area Proportional Euler Diagrams

Aidan Delaney

aidan@ontologyengineering.org | @aidandelaney

AP Euler Diagrams

Previous Work

Our Contribution

  • "Standard" algorithms -- force directed graph layout (Eades 1984).
  • Use a force model to attract/repel circle centres.
    • Hooke's law & Coulomb's law.
    • well understood optimisations, advantages and disadvantages.
  • Usable & maintainable implementation.

Force Directed Circles
Force Directed Circles

Vennom

  • Implements the force directed layout algorithm.
  • Key-weakness #1 - calculates area by drawing a bitmap.
  • Key-weakness #2 - calculates forces in a pairwise manner.

Bitmaps

Advantages

  • Computationally fast \(O(n^2)\), if the bitmap is small.
  • Straightforward to implement.

Disadvantages

  • Not perfectly accurate.
  • \(O(n^2)\) grows poorly.
  • Optimisations cause some corner cases with "holes".

Pairwise

TODO

  • We need to add good area reporting to Vennom.
  • In about 5 slides time...
  • Need to compute forces is a concurrent manner.

GNU R

Prevalence

  • Use in Bioinformatics (Croucher et al. 2014, A. J. Page et al. (2016)).
  • Little Book of Bioinformatics
  • Done well it will significantly improve impact.

Example t-test

Paired t-test:

before = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3)
after  = c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1)

t.test(before, after, paired=TRUE)

Example histogram

before = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3)

pdf("myhistogram.pdf")
hist(before)
svg("myhistogram.svg")
hist(before)

More Examples

ggplot2 example
ggplot2 example
  • See http://www.statmethods.net/ for some good examples.
  • ggplot2 is an extremely powerful graphics library for R.
    • The ideas behind ggplot2 are similar to the motivation for the VMG; developing semantics for visualisations (or vice-versa).

Exensible

  • There exists an "app-store" for R.
  • CRAN: Comprehensive R Archive Network.
  • Modelled on CPAN (from 1993!).
  • ggplot2 and venneuler are distributed via CRAN.

Using VennEuler

library("venneuler")

vd <- venneuler(c(A=0.3, B=0.3, C=1.1, "A&B"=0.1, "A&C"=0.2, "B&C"=0.1 ,"A&B&C"=0.1))
plot(vd)

EuleR wish?

library("euleR")

ed <- euleR(c(A=0.3, B=0.3, C=1.1, "A&B"=0.1, "A&C"=0.2, "B&C"=0.1 ,"A&B&C"=0.1))
plot(ed)

Be source compatible!

Requirements

  1. Source compatibility with venneuler.
  2. Distributable via CRAN.
  3. Calculate area more accurately than vennom.
  4. Be maintainable.

Two Architectures

Monolith

  • Use RJava -- vennom is written in Java.
  • Bundle vennom, area calculation and R-bridge as CRAN package.
  • Manually manage transitive dependencies

Monolith

euleR monolithic architecture
euleR monolithic architecture

Web Service

  • Require Internet connection.
  • Move R-bridge, vennom, area calculation an dependencies to cloud.
  • Allows tracking or user requests.

Web Service

euleR cloud architecture
euleR cloud architecture

Area Calculation

Problem

  • As discussed, the bitmap approach is inaccurate and doesn't scale (\(O(n^2)\)).
  • How to practically calculate the area of a concrete zone.

Inkscape inspired

  • Represent all primitives as paths.
  • A path is a "sequence of symmetric power basis polynomials".
  • Calculating the intersection of two paths is straightforward.
    • Paths can be arbitrarily shaped -- fun for the future.
  • Somewhat inaccurate when representing circles (I'd like to quantify this).
  • Inkscape's area calculation is broken for some complex paths.

s-basis area

S-basis closed path
S-basis closed path

JavaGeom

  • A geometry package for Java
  • Provides a CircleArc2D
    • we added getChordArea() and a few other things.
  • Implemented an area calculation for BoundaryPolyCurve2D<CircleArc2D>.

Our Solution

  • Take an input concrete diagram.
  • Calculate the concrete Euler graph.
  • Split each circle into 'arcs' at the intersection points indicated by the Euler graph.
  • Represent each concrete zone as a BoundaryPolyCurve2D<CircleArc2D>.
  • Calculate the concrete inclusion hierarchy for concrete zones.

Conclusion

Examples

Venneuler example Vennom example

But we're significantly faster.

Development

  • The supposed hard bits were easy.
  • The easy bit -- area calculation -- is surprisingly involved.
  • The core vennom algorithm needs improvement.
  • Development -- i.e. impact -- is a full-time commitment.

Bibliography

Croucher, Nicholas, Andrew Page, Thomas Connor, Aidan Delaney, Jacqueline Keane, Stephen Bentley, Julian Parkhill, and Simon Harris. 2014. “Rapid Phylogenetic Analysis of Large Samples of Recombinant Bacterial Whole Genome Sequences Using Gubbins.” Nucleic Acids Research, 1–13.

Eades, Peter. 1984. “A Heuristic for Graph Drawing.” Congressus Numerantium 42 (11): 149–60.

Micallef, Luana, and Peter Rodgers. 2014. “EulerAPE: Drawing Area-Proportional 3-Venn Diagrams Using Ellipses.” PLoS ONE 9 (7). Public Library of Science: 1–18. doi:10.1371/journal.pone.0101717.

Page, Andrew J., Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane, and Simon R. Harris. 2016. “SNP-Sites: Rapid Efficient Extraction of SNPs from Multi-FASTA Alignments.” BioRxiv. Cold Spring Harbor Labs Journals. doi:10.1101/038190.

Wilkinson, Leland. 2012. “Exact and Approximate Area-Proportional Circular Venn and Euler Diagrams.” IEEE Trans Vis Comput Graph 18 (2): 321–31.