Title: | Statistical Tools for Topological Data Analysis |
---|---|
Description: | Tools for the statistical analysis of persistent homology and for density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries 'GUDHI' <https://project.inria.fr/gudhi/software/>, 'Dionysus' <https://www.mrzv.org/software/dionysus/>, and 'PHAT' <https://bitbucket.org/phat-code/phat/>. This package also implements the methods in Fasy et al. (2014) <doi:10.1214/14-AOS1252> and Chazal et al. (2014) <doi:10.1145/2582112.2582128> for analyzing the statistical significance of persistent homology features. |
Authors: | Brittany T. Fasy, Jisu Kim, Fabrizio Lecci, Clement Maria, David L. Millman, Vincent Rouvreau. |
Maintainer: | Jisu Kim <[email protected]> |
License: | GPL-3 |
Version: | 1.9.1 |
Built: | 2024-11-25 04:40:33 UTC |
Source: | https://github.com/cran/TDA |
Tools for Topological Data Analysis. In particular it provides functions for the statistical analysis of persistent homology and for density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus and PHAT.
Package: | TDA |
Type: | Package |
Version: | 1.9.1 |
Date: | 2024-01-23 |
License: | GPL-3 |
Brittany Terese Fasy, Jisu Kim, Fabrizio Lecci, Clement Maria, David L. Millman, and Vincent Rouvreau
Maintainer: Jisu Kim <[email protected]>
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
Fasy BT, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology: Confidence Sets for Persistence Diagrams", (arXiv:1303.7117). To appear, Annals of Statistics.
Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.
Chazal F, Fasy BT, Lecci F, Rinaldo A, Wasserman L (2014). "Stochastic Convergence of Persistence Landscapes and Silhouettes." Proceedings of the 30th Symposium of Computational Geometry (SoCG). (arXiv:1312.0308)
Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Subsampling Methods for Persistent Homology." (arXiv:1406.1901)
Maria C (2014). "GUDHI, Simplicial Complexes and Persistent Homology Packages." https://project.inria.fr/gudhi/software/.
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology." https://www.mrzv.org/software/dionysus/.
Bauer U, Kerber M, Reininghaus J (2012). "PHAT, a software library for persistent homology". https://bitbucket.org/phat-code/phat/.
The function alphaComplexDiag
computes the persistence diagram of the alpha complex filtration built on top of a point cloud.
alphaComplexDiag( X, maxdimension = NCOL(X) - 1, library = "GUDHI", location = FALSE, printProgress = FALSE)
alphaComplexDiag( X, maxdimension = NCOL(X) - 1, library = "GUDHI", location = FALSE, printProgress = FALSE)
X |
an |
maxdimension |
integer: max dimension of the homological features to be computed. (e.g. 0 for connected components, 1 for connected components and loops, 2 for connected components, loops, voids, etc.) |
library |
either a string or a vector of length two. When a vector is given, the first element specifies which library to compute the Alpha Complex filtration, and the second element specifies which library to compute the persistence diagram. If a string is used, then the same library is used. For computing the Alpha Complex filtration, the user can use the library |
location |
if |
printProgress |
if |
The function alphaComplexDiag
constructs the Alpha Complex filtration, using the C++ library GUDHI.
Then for computing the persistence diagram from the Alpha Complex filtration, the user can use either the C++ library GUDHI, Dionysus, or PHAT.
See refereneces.
The function alphaComplexDiag
returns a list with the following elements:
diagram |
an object of class |
birthLocation |
only if |
deathLocation |
only if |
cycleLocation |
only if |
Jisu Kim and Vincent Rouvreau
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
Rouvreau V (2015). "Alpha complex." In GUDHI User and Reference Manual. GUDHI Editorial Board. https://gudhi.inria.fr/doc/latest/group__alpha__complex.html
Edelsbrunner H, Kirkpatrick G, Seidel R (1983). "On the shape of a set of points in the plane." IEEE Trans. Inform. Theory.
Maria C (2014). "GUDHI, Simplicial Complexes and Persistent Homology Packages." https://project.inria.fr/gudhi/software/
summary.diagram
, plot.diagram
, alphaShapeDiag
, gridDiag
, ripsDiag
# input data generated from a circle X <- circleUnif(n = 30) # persistence diagram of alpha complex DiagAlphaCmplx <- alphaComplexDiag( X = X, library = c("GUDHI", "Dionysus"), location = TRUE, printProgress = TRUE) # plot par(mfrow = c(1, 2)) plot(DiagAlphaCmplx[["diagram"]]) one <- which(DiagAlphaCmplx[["diagram"]][, 1] == 1) one <- one[which.max( DiagAlphaCmplx[["diagram"]][one, 3] - DiagAlphaCmplx[["diagram"]][one, 2])] plot(X, col = 2, main = "Representative loop of data points") for (i in seq(along = one)) { for (j in seq_len(dim(DiagAlphaCmplx[["cycleLocation"]][[one[i]]])[1])) { lines( DiagAlphaCmplx[["cycleLocation"]][[one[i]]][j, , ], pch = 19, cex = 1, col = i) } } par(mfrow = c(1, 1))
# input data generated from a circle X <- circleUnif(n = 30) # persistence diagram of alpha complex DiagAlphaCmplx <- alphaComplexDiag( X = X, library = c("GUDHI", "Dionysus"), location = TRUE, printProgress = TRUE) # plot par(mfrow = c(1, 2)) plot(DiagAlphaCmplx[["diagram"]]) one <- which(DiagAlphaCmplx[["diagram"]][, 1] == 1) one <- one[which.max( DiagAlphaCmplx[["diagram"]][one, 3] - DiagAlphaCmplx[["diagram"]][one, 2])] plot(X, col = 2, main = "Representative loop of data points") for (i in seq(along = one)) { for (j in seq_len(dim(DiagAlphaCmplx[["cycleLocation"]][[one[i]]])[1])) { lines( DiagAlphaCmplx[["cycleLocation"]][[one[i]]][j, , ], pch = 19, cex = 1, col = i) } } par(mfrow = c(1, 1))
The function alphaComplexFiltration
computes the alpha complex filtration built on top of a point cloud.
alphaComplexFiltration( X, library = "GUDHI", printProgress = FALSE)
alphaComplexFiltration( X, library = "GUDHI", printProgress = FALSE)
X |
an |
library |
a string specifying which library to compute the Alpha Complex filtration. The user can use the library |
printProgress |
if |
The function alphaComplexFiltration
constructs the alpha complex filtration, using the C++ library GUDHI.
See refereneces.
The function alphaComplexFiltration
returns a list with the following elements:
cmplx |
a list representing the complex. Its i-th element represents the vertices of i-th simplex. |
values |
a vector representing the filtration values. Its i-th element represents the filtration value of i-th simplex. |
increasing |
a logical variable indicating if the filtration values are in increasing order ( |
coordinates |
a matrix representing the coordinates of vertices. Its i-th row represents the coordinate of i-th vertex. |
Jisu Kim and Vincent Rouvreau
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
Rouvreau V (2015). "Alpha complex." In GUDHI User and Reference Manual. GUDHI Editorial Board. https://gudhi.inria.fr/doc/latest/group__alpha__complex.html
Edelsbrunner H, Kirkpatrick G, Seidel R (1983). "On the shape of a set of points in the plane." IEEE Trans. Inform. Theory.
Maria C (2014). "GUDHI, Simplicial Complexes and Persistent Homology Packages." https://project.inria.fr/gudhi/software/
alphaComplexDiag
, filtrationDiag
# input data generated from a circle X <- circleUnif(n = 10) # alpha complex filtration FltAlphaComplex <- alphaComplexFiltration(X = X, printProgress = TRUE) # plot alpha complex filtration lim <- rep(c(-1, 1), 2) plot(NULL, type = "n", xlim = lim[1:2], ylim = lim[3:4], main = "Alpha Complex Filtration Plot") for (idx in seq(along = FltAlphaComplex[["cmplx"]])) { polygon(FltAlphaComplex[["coordinates"]][FltAlphaComplex[["cmplx"]][[idx]], , drop = FALSE], col = "pink", border = NA, xlim = lim[1:2], ylim = lim[3:4]) } for (idx in seq(along = FltAlphaComplex[["cmplx"]])) { polygon(FltAlphaComplex[["coordinates"]][FltAlphaComplex[["cmplx"]][[idx]], , drop = FALSE], col = NULL, xlim = lim[1:2], ylim = lim[3:4]) } points(FltAlphaComplex[["coordinates"]], pch = 16)
# input data generated from a circle X <- circleUnif(n = 10) # alpha complex filtration FltAlphaComplex <- alphaComplexFiltration(X = X, printProgress = TRUE) # plot alpha complex filtration lim <- rep(c(-1, 1), 2) plot(NULL, type = "n", xlim = lim[1:2], ylim = lim[3:4], main = "Alpha Complex Filtration Plot") for (idx in seq(along = FltAlphaComplex[["cmplx"]])) { polygon(FltAlphaComplex[["coordinates"]][FltAlphaComplex[["cmplx"]][[idx]], , drop = FALSE], col = "pink", border = NA, xlim = lim[1:2], ylim = lim[3:4]) } for (idx in seq(along = FltAlphaComplex[["cmplx"]])) { polygon(FltAlphaComplex[["coordinates"]][FltAlphaComplex[["cmplx"]][[idx]], , drop = FALSE], col = NULL, xlim = lim[1:2], ylim = lim[3:4]) } points(FltAlphaComplex[["coordinates"]], pch = 16)
The function alphaShapeDiag
computes the persistence diagram of the alpha shape filtration built on top of a point cloud in 3 dimension.
alphaShapeDiag( X, maxdimension = NCOL(X) - 1, library = "GUDHI", location = FALSE, printProgress = FALSE)
alphaShapeDiag( X, maxdimension = NCOL(X) - 1, library = "GUDHI", location = FALSE, printProgress = FALSE)
X |
an |
maxdimension |
integer: max dimension of the homological features to be computed. (e.g. 0 for connected components, 1 for connected components and loops, 2 for connected components, loops, voids, etc.) |
library |
either a string or a vector of length two. When a vector is given, the first element specifies which library to compute the Alpha Shape filtration, and the second element specifies which library to compute the persistence diagram. If a string is used, then the same library is used. For computing the Alpha Shape filtration, the user can use the library |
location |
if |
printProgress |
if |
The function alphaShapeDiag
constructs the Alpha Shape filtration, using the C++ library GUDHI.
Then for computing the persistence diagram from the Alpha Shape filtration, the user can use either the C++ library GUDHI, Dionysus, or PHAT.
See refereneces.
The function alphaShapeDiag
returns a list with the following elements:
diagram |
an object of class |
birthLocation |
only if |
deathLocation |
only if |
cycleLocation |
only if |
Jisu Kim and Vincent Rouvreau
Fischer K (2005). "Introduction to Alpha Shapes."
Edelsbrunner H, Mucke EP (1994). "Three-dimensional Alpha Shapes." ACM Trans. Graph.
Maria C (2014). "GUDHI, Simplicial Complexes and Persistent Homology Packages." https://project.inria.fr/gudhi/software/
Morozov D (2008). "Homological Illusions of Persistence and Stability."
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
summary.diagram
, plot.diagram
, alphaComplexDiag
, gridDiag
, ripsDiag
# input data generated from cylinder n <- 30 X <- cbind(circleUnif(n = n), runif(n = n, min = -0.1, max = 0.1)) # persistence diagram of alpha shape DiagAlphaShape <- alphaShapeDiag( X = X, maxdimension = 1, library = c("GUDHI", "Dionysus"), location = TRUE, printProgress = TRUE) # plot diagram and first two dimension of data par(mfrow = c(1, 2)) plot(DiagAlphaShape[["diagram"]]) plot(X[, 1:2], col = 2, main = "Representative loop of alpha shape filtration") one <- which(DiagAlphaShape[["diagram"]][, 1] == 1) one <- one[which.max( DiagAlphaShape[["diagram"]][one, 3] - DiagAlphaShape[["diagram"]][one, 2])] for (i in seq(along = one)) { for (j in seq_len(dim(DiagAlphaShape[["cycleLocation"]][[one[i]]])[1])) { lines( DiagAlphaShape[["cycleLocation"]][[one[i]]][j, , 1:2], pch = 19, cex = 1, col = i) } } par(mfrow = c(1, 1))
# input data generated from cylinder n <- 30 X <- cbind(circleUnif(n = n), runif(n = n, min = -0.1, max = 0.1)) # persistence diagram of alpha shape DiagAlphaShape <- alphaShapeDiag( X = X, maxdimension = 1, library = c("GUDHI", "Dionysus"), location = TRUE, printProgress = TRUE) # plot diagram and first two dimension of data par(mfrow = c(1, 2)) plot(DiagAlphaShape[["diagram"]]) plot(X[, 1:2], col = 2, main = "Representative loop of alpha shape filtration") one <- which(DiagAlphaShape[["diagram"]][, 1] == 1) one <- one[which.max( DiagAlphaShape[["diagram"]][one, 3] - DiagAlphaShape[["diagram"]][one, 2])] for (i in seq(along = one)) { for (j in seq_len(dim(DiagAlphaShape[["cycleLocation"]][[one[i]]])[1])) { lines( DiagAlphaShape[["cycleLocation"]][[one[i]]][j, , 1:2], pch = 19, cex = 1, col = i) } } par(mfrow = c(1, 1))
The function alphaShapeFiltration
computes the alpha shape filtration built on top of a point cloud in 3 dimension.
alphaShapeFiltration( X, library = "GUDHI", printProgress = FALSE)
alphaShapeFiltration( X, library = "GUDHI", printProgress = FALSE)
X |
an |
library |
a string specifying which library to compute the Alpha Shape filtration. The user can use the library |
printProgress |
if |
The function alphaShapeFiltration
constructs the alpha shape filtration, using the C++ library GUDHI.
See refereneces.
The function alphaShapeFiltration
returns a list with the following elements:
cmplx |
a list representing the complex. Its i-th element represents the vertices of i-th simplex. |
values |
a vector representing the filtration values. Its i-th element represents the filtration value of i-th simplex. |
increasing |
a logical variable indicating if the filtration values are in increasing order ( |
coordinates |
a matrix representing the coordinates of vertices. Its i-th row represents the coordinate of i-th vertex. |
Jisu Kim and Vincent Rouvreau
Fischer K (2005). "Introduction to Alpha Shapes."
Edelsbrunner H, Mucke EP (1994). "Three-dimensional Alpha Shapes." ACM Trans. Graph.
Maria C (2014). "GUDHI, Simplicial Complexes and Persistent Homology Packages." https://project.inria.fr/gudhi/software/
Morozov D (2008). "Homological Illusions of Persistence and Stability."
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
alphaShapeDiag
, filtrationDiag
# input data generated from sphere X <- sphereUnif(n = 20, d = 2) # alpha shape filtration FltAlphaShape <- alphaShapeFiltration(X = X, printProgress = TRUE)
# input data generated from sphere X <- sphereUnif(n = 20, d = 2) # alpha shape filtration FltAlphaShape <- alphaShapeFiltration(X = X, printProgress = TRUE)
The function bootstrapBand
computes a uniform symmetric confidence band around a function of the data X
, evaluated on a Grid
, using the bootstrap algorithm. See Details and References.
bootstrapBand( X, FUN, Grid, B = 30, alpha = 0.05, parallel = FALSE, printProgress = FALSE, weight = NULL, ...)
bootstrapBand( X, FUN, Grid, B = 30, alpha = 0.05, parallel = FALSE, printProgress = FALSE, weight = NULL, ...)
X |
an |
FUN |
a function whose inputs are an |
Grid |
an |
B |
the number of bootstrap iterations. |
alpha |
|
parallel |
logical: if |
printProgress |
if |
weight |
either NULL, a number, or a vector of length |
... |
additional parameters for the function |
First, the input function FUN
is evaluated on the Grid
using the original data X
. Then, for B
times, the bootstrap algorithm subsamples n
points of X
(with replacement), evaluates the function FUN
on the Grid
using the subsample, and computes the distance between the original function and the bootstrapped one. The result is a sequence of
B
values. The (1-alpha
) confidence band is constructed by taking the (1-alpha
) quantile of these values.
The function bootstrapBand
returns a list with the following elements:
width |
number: ( |
fun |
a numeric vector of length |
band |
an |
Jisu Kim and Fabrizio Lecci
Wasserman L (2004). "All of statistics: a concise course in statistical inference." Springer.
Fasy BT, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology: Confidence Sets for Persistence Diagrams." (arXiv:1303.7117). Annals of Statistics.
Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.
# Generate data from mixture of 2 normals. n <- 2000 X <- c(rnorm(n / 2), rnorm(n / 2, mean = 3, sd = 1.2)) # Construct a grid of points over which we evaluate the function by <- 0.02 Grid <- seq(-3, 6, by = by) ## bandwidth for kernel density estimator h <- 0.3 ## Bootstrap confidence band band <- bootstrapBand(X, kde, Grid, B = 80, parallel = FALSE, alpha = 0.05, h = h) plot(Grid, band[["fun"]], type = "l", lwd = 2, ylim = c(0, max(band[["band"]])), main = "kde with 0.95 confidence band") lines(Grid, pmax(band[["band"]][, 1], 0), col = 2, lwd = 2) lines(Grid, band[["band"]][, 2], col = 2, lwd = 2)
# Generate data from mixture of 2 normals. n <- 2000 X <- c(rnorm(n / 2), rnorm(n / 2, mean = 3, sd = 1.2)) # Construct a grid of points over which we evaluate the function by <- 0.02 Grid <- seq(-3, 6, by = by) ## bandwidth for kernel density estimator h <- 0.3 ## Bootstrap confidence band band <- bootstrapBand(X, kde, Grid, B = 80, parallel = FALSE, alpha = 0.05, h = h) plot(Grid, band[["fun"]], type = "l", lwd = 2, ylim = c(0, max(band[["band"]])), main = "kde with 0.95 confidence band") lines(Grid, pmax(band[["band"]][, 1], 0), col = 2, lwd = 2) lines(Grid, band[["band"]][, 2], col = 2, lwd = 2)
The function bootstrapDiagram
computes a (1-alpha)
confidence set for the Persistence Diagram of a filtration of sublevel sets (or superlevel sets) of a function evaluated over a grid of points. The function returns the (1-alpha
) quantile of B
bottleneck distances (or Wasserstein distances), computed in B
iterations of the bootstrap algorithm.
bootstrapDiagram( X, FUN, lim, by, maxdimension = length(lim) / 2 - 1, sublevel = TRUE, library = "GUDHI", B = 30, alpha = 0.05, distance = "bottleneck", dimension = min(1, maxdimension), p = 1, parallel = FALSE, printProgress = FALSE, weight = NULL, ...)
bootstrapDiagram( X, FUN, lim, by, maxdimension = length(lim) / 2 - 1, sublevel = TRUE, library = "GUDHI", B = 30, alpha = 0.05, distance = "bottleneck", dimension = min(1, maxdimension), p = 1, parallel = FALSE, printProgress = FALSE, weight = NULL, ...)
X |
an |
FUN |
a function whose inputs are 1) an |
lim |
a |
by |
either a number or a vector of length |
maxdimension |
a number that indicates the maximum dimension to compute persistent homology to. The default value is |
sublevel |
a logical variable indicating if the Persistence Diagram should be computed for sublevel sets ( |
library |
a string specifying which library to compute the persistence diagram. The user can choose either the library |
B |
the number of bootstrap iterations. The default value is |
alpha |
The function |
distance |
a string specifying the distance to be used for persistence diagrams: either |
dimension |
|
p |
if |
parallel |
logical: if |
printProgress |
if |
weight |
either NULL, a number, or a vector of length |
... |
additional parameters for the function |
The function bootstrapDiagram
uses gridDiag
to compute the persistence diagram of the input function using the entire sample. Then the bootstrap algorithm, for B
times, computes the bottleneck distance between the original persistence diagram and the one computed using a subsample. Finally the (1-alpha
) quantile of these B
values is returned. See (Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman, 2014) for discussion of the method.
The function bootstrapDiagram
returns the (1-alpha
) quantile of the values computed by the bootstrap algorithm.
The function bootstrapDiagram
uses the C++ library Dionysus for the computation of bottleneck and Wasserstein distances. See references.
Jisu Kim and Fabrizio Lecci
Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.
Wasserman L (2004), "All of statistics: a concise course in statistical inference." Springer.
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology." https://www.mrzv.org/software/dionysus/
bottleneck
, bootstrapBand
,
distFct
, kde
, kernelDist
, dtm
,
summary.diagram
, plot.diagram
## confidence set for the Kernel Density Diagram # input data n <- 400 XX <- circleUnif(n) ## Ranges of the grid Xlim <- c(-1.8, 1.8) Ylim <- c(-1.6, 1.6) lim <- cbind(Xlim, Ylim) by <- 0.05 h <- .3 #bandwidth for the function kde #Kernel Density Diagram of the superlevel sets Diag <- gridDiag(XX, kde, lim = lim, by = by, sublevel = FALSE, printProgress = TRUE, h = h) # confidence set B <- 10 ## the number of bootstrap iterations should be higher! ## this is just an example alpha <- 0.05 cc <- bootstrapDiagram(XX, kde, lim = lim, by = by, sublevel = FALSE, B = B, alpha = alpha, dimension = 1, printProgress = TRUE, h = h) plot(Diag[["diagram"]], band = 2 * cc)
## confidence set for the Kernel Density Diagram # input data n <- 400 XX <- circleUnif(n) ## Ranges of the grid Xlim <- c(-1.8, 1.8) Ylim <- c(-1.6, 1.6) lim <- cbind(Xlim, Ylim) by <- 0.05 h <- .3 #bandwidth for the function kde #Kernel Density Diagram of the superlevel sets Diag <- gridDiag(XX, kde, lim = lim, by = by, sublevel = FALSE, printProgress = TRUE, h = h) # confidence set B <- 10 ## the number of bootstrap iterations should be higher! ## this is just an example alpha <- 0.05 cc <- bootstrapDiagram(XX, kde, lim = lim, by = by, sublevel = FALSE, B = B, alpha = alpha, dimension = 1, printProgress = TRUE, h = h) plot(Diag[["diagram"]], band = 2 * cc)
The function bottleneck
computes the bottleneck distance between two persistence diagrams.
bottleneck(Diag1, Diag2, dimension = 1)
bottleneck(Diag1, Diag2, dimension = 1)
Diag1 |
an object of class |
Diag2 |
an object of class |
dimension |
an integer or a vector specifying the dimension of the features used to compute the bottleneck distance. |
The bottleneck distance between two diagrams is the cost of the optimal matching between points of the two diagrams. Note that all the diagonal points are included in the persistence diagrams when computing the optimal matching. When a vector is given for dimension
, then maximum among bottleneck distances using each element in dimension
is returned. The function bottleneck
is an R wrapper of the function "bottleneck_distance" in the C++ library Dionysus. See references.
The function bottleneck
returns the value of the bottleneck distance between the two persistence diagrams.
Jisu Kim and Fabrizio Lecci
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology." https://www.mrzv.org/software/dionysus/
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
wasserstein
,
alphaComplexDiag
, alphaComplexDiag
, gridDiag
, ripsDiag
,
plot.diagram
XX1 <- circleUnif(20) XX2 <- circleUnif(20, r = 0.2) DiagLim <- 5 maxdimension <- 1 Diag1 <- ripsDiag(XX1, maxdimension, DiagLim, printProgress = FALSE) Diag2 <- ripsDiag(XX2, maxdimension, DiagLim, printProgress = FALSE) bottleneckDist <- bottleneck(Diag1[["diagram"]], Diag2[["diagram"]], dimension = 1) print(bottleneckDist)
XX1 <- circleUnif(20) XX2 <- circleUnif(20, r = 0.2) DiagLim <- 5 maxdimension <- 1 Diag1 <- ripsDiag(XX1, maxdimension, DiagLim, printProgress = FALSE) Diag2 <- ripsDiag(XX2, maxdimension, DiagLim, printProgress = FALSE) bottleneckDist <- bottleneck(Diag1[["diagram"]], Diag2[["diagram"]], dimension = 1) print(bottleneckDist)
The function circleUnif
samples n
points from the circle of radius r
, uniformly with respect to the circumference length.
circleUnif(n, r = 1)
circleUnif(n, r = 1)
n |
an integer specifying the number of points in the sample. |
r |
a numeric variable specifying the radius of the circle. The default value is |
circleUnif
returns an n
by 2 matrix of coordinates.
Uniform sample from sphere of arbitrary dimension can be generated using sphereUnif
.
Fabrizio Lecci
X <- circleUnif(100) plot(X)
X <- circleUnif(100) plot(X)
Given a point cloud, or a matrix of distances, the function clusterTree
computes a density estimator and returns the corresponding cluster tree of superlevel sets (lambda tree and kappa tree; see references).
clusterTree( X, k, h = NULL, density = "knn", dist = "euclidean", d = NULL, Nlambda = 100, printProgress = FALSE)
clusterTree( X, k, h = NULL, density = "knn", dist = "euclidean", d = NULL, Nlambda = 100, printProgress = FALSE)
X |
If |
k |
an integer value specifying the parameter of the underlying k-nearest neighbor similarity graph, used to determine connected components. If |
h |
real value: if |
density |
string: if |
dist |
string: can be |
d |
integer: if |
Nlambda |
integer: size of the grid of values of the density estimator, used to compute the cluster tree. High |
printProgress |
logical: if |
The function clusterTree
is an implementation of Algorithm 1 in the first reference.
The function clusterTree
returns an object of class clusterTree
, a list with the following components
density |
Vector of length |
DataPoints |
A list whose elements are the points of |
n |
The number of points stored in the input matrix |
id |
Vector: the IDs associated to the branches of the cluster tree |
children |
A list whose elements are the IDs of the children of each branch, in the same order of |
parent |
Vector: the IDs of the parents of each branch, in the same order of |
silo |
A list whose elements are the horizontal coordinates of the silo of each branch, in the same order of |
Xbase |
Vector: the horiontal coordinates of the branches of the cluster tree, in the same order of |
lambdaBottom |
Vector: the vertical bottom coordinates of the branches of the lambda tree, in the same order of |
lambdaTop |
Vector: the vertical top coordinates of the branches of the lambda tree, in the same order of |
rBottom |
(only if |
rTop |
(only if |
alphaBottom |
Vector: the vertical bottom coordinates of the branches of the alpha tree, in the same order of |
alphaTop |
Vector: the vertical top coordinates of the branches of the alpha tree, in the same order of |
Kbottom |
Vector: the vertical bottom coordinates of the branches of the kappa tree, in the same order of |
Ktop |
Vector: the vertical top coordinates of the branches of the kappa tree, in the same order of |
Fabrizio Lecci
Kent BP, Rinaldo A, Verstynen T (2013). "DeBaCl: A Python Package for Interactive DEnsity-BAsed CLustering." arXiv:1307.8136
Lecci F, Rinaldo A, Wasserman L (2014). "Metric Embeddings for Cluster Trees"
## Generate data: 3 clusters n <- 1200 #sample size Neach <- floor(n / 4) X1 <- cbind(rnorm(Neach, 1, .8), rnorm(Neach, 5, 0.8)) X2 <- cbind(rnorm(Neach, 3.5, .8), rnorm(Neach, 5, 0.8)) X3 <- cbind(rnorm(Neach, 6, 1), rnorm(Neach, 1, 1)) X <- rbind(X1, X2, X3) k <- 100 #parameter of knn ## Density clustering using knn and kde Tree <- clusterTree(X, k, density = "knn") TreeKDE <- clusterTree(X, k, h = 0.3, density = "kde") par(mfrow = c(2, 3)) plot(X, pch = 19, cex = 0.6) # plot lambda trees plot(Tree, type = "lambda", main = "lambda Tree (knn)") plot(TreeKDE, type = "lambda", main = "lambda Tree (kde)") # plot clusters plot(X, pch = 19, cex = 0.6, main = "cluster labels") for (i in Tree[["id"]]){ points(matrix(X[Tree[["DataPoints"]][[i]],],ncol = 2), col = i, pch = 19, cex = 0.6) } #plot kappa trees plot(Tree, type = "kappa", main = "kappa Tree (knn)") plot(TreeKDE, type = "kappa", main = "kappa Tree (kde)")
## Generate data: 3 clusters n <- 1200 #sample size Neach <- floor(n / 4) X1 <- cbind(rnorm(Neach, 1, .8), rnorm(Neach, 5, 0.8)) X2 <- cbind(rnorm(Neach, 3.5, .8), rnorm(Neach, 5, 0.8)) X3 <- cbind(rnorm(Neach, 6, 1), rnorm(Neach, 1, 1)) X <- rbind(X1, X2, X3) k <- 100 #parameter of knn ## Density clustering using knn and kde Tree <- clusterTree(X, k, density = "knn") TreeKDE <- clusterTree(X, k, h = 0.3, density = "kde") par(mfrow = c(2, 3)) plot(X, pch = 19, cex = 0.6) # plot lambda trees plot(Tree, type = "lambda", main = "lambda Tree (knn)") plot(TreeKDE, type = "lambda", main = "lambda Tree (kde)") # plot clusters plot(X, pch = 19, cex = 0.6, main = "cluster labels") for (i in Tree[["id"]]){ points(matrix(X[Tree[["DataPoints"]][[i]],],ncol = 2), col = i, pch = 19, cex = 0.6) } #plot kappa trees plot(Tree, type = "kappa", main = "kappa Tree (knn)") plot(TreeKDE, type = "kappa", main = "kappa Tree (kde)")
The function distFct
computes the distance between each point of a set Grid
and the corresponding closest point of another set X
.
distFct(X, Grid)
distFct(X, Grid)
X |
a numeric |
Grid |
a numeric |
Given a set of points X
, the distance function computed at is defined as
The function distFct
returns a numeric vector of length , where
is the number of points stored in
Grid
.
Each value in V corresponds to the distance between a point in G and the nearest point in X.
Fabrizio Lecci
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the function interval <- 0.065 Xseq <- seq(-1.6, 1.6, by = interval) Yseq <- seq(-1.7, 1.7, by = interval) Grid <- expand.grid(Xseq, Yseq) ## distance fct distance <- distFct(X, Grid)
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the function interval <- 0.065 Xseq <- seq(-1.6, 1.6, by = interval) Yseq <- seq(-1.7, 1.7, by = interval) Grid <- expand.grid(Xseq, Yseq) ## distance fct distance <- distFct(X, Grid)
The function dtm
computes the "distance to measure function" on a set of points Grid
, using the uniform empirical measure on a set of points X
. Given a probability measure , The distance to measure function, for each
, is defined by
where , and
and
are tuning parameters. As
m0
increases, DTM function becomes smoother, so m0
can be understood as a smoothing parameter. r
affects less but also changes DTM function as well. The DTM can be seen as a smoothed version of the distance function. See Details and References.
Given , the empirical version of the distance to measure is
where and
is the set containing the
nearest neighbors of
among
.
dtm(X, Grid, m0, r = 2, weight = 1)
dtm(X, Grid, m0, r = 2, weight = 1)
X |
an |
Grid |
an |
m0 |
a numeric variable for the smoothing parameter of the distance to measure. Roughly, |
r |
a numeric variable for the tuning parameter of the distance to measure. The value of |
weight |
either a number, or a vector of length |
See (Chazal, Cohen-Steiner, and Merigot, 2011, Definition 3.2) and (Chazal, Massart, and Michel, 2015, Equation (2)) for a formal definition of the "distance to measure" function.
The function dtm
returns a vector of length (the number of points stored in
Grid
) containing the value of the distance to measure function evaluated at each point of Grid
.
Jisu Kim and Fabrizio Lecci
Chazal F, Cohen-Steiner D, Merigot Q (2011). "Geometric inference for probability measures." Foundations of Computational Mathematics 11.6, 733-751.
Chazal F, Massart P, Michel B (2015). "Rates of convergence for robust geometric inference."
Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the function by <- 0.065 Xseq <- seq(-1.6, 1.6, by = by) Yseq <- seq(-1.7, 1.7, by = by) Grid <- expand.grid(Xseq, Yseq) ## distance to measure m0 <- 0.1 DTM <- dtm(X, Grid, m0)
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the function by <- 0.065 Xseq <- seq(-1.6, 1.6, by = by) Yseq <- seq(-1.7, 1.7, by = by) Grid <- expand.grid(Xseq, Yseq) ## distance to measure m0 <- 0.1 DTM <- dtm(X, Grid, m0)
The function filtrationDiag
computes the persistence diagram of the filtration.
filtrationDiag( filtration, maxdimension, library = "GUDHI", location = FALSE, printProgress = FALSE, diagLimit = NULL)
filtrationDiag( filtration, maxdimension, library = "GUDHI", location = FALSE, printProgress = FALSE, diagLimit = NULL)
filtration |
a list representing the input filtration. This list consists of three components: |
maxdimension |
integer: max dimension of the homological features to be computed. (e.g. 0 for connected components, 1 for connected components and loops, 2 for connected components, loops, voids, etc.) |
library |
a string specifying which library to compute the persistence diagram. The user can choose either the library |
location |
if |
printProgress |
logical: if |
diagLimit |
a number that replaces |
The user can decide to use either the C++ library GUDHI or Dionysus. See refereneces.
The function filtrationDiag
returns a list with the following elements:
diagram |
an object of class |
birthLocation |
only if |
deathLocation |
only if |
cycleLocation |
only if |
Jisu Kim
Maria C (2014). "GUDHI, Simplicial Complexes and Persistent Homology Packages." https://project.inria.fr/gudhi/software/.
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology". https://www.mrzv.org/software/dionysus/
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
Fasy B, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology." (arXiv:1303.7117). Annals of Statistics.
n <- 5 X <- cbind(cos(2*pi*seq_len(n)/n), sin(2*pi*seq_len(n)/n)) maxdimension <- 1 maxscale <- 1.5 dist <- "euclidean" library <- "Dionysus" FltRips <- ripsFiltration(X = X, maxdimension = maxdimension, maxscale = maxscale, dist = "euclidean", library = "Dionysus", printProgress = TRUE) DiagFltRips <- filtrationDiag(filtration = FltRips, maxdimension = maxdimension, library = "Dionysus", location = TRUE, printProgress = TRUE) plot(DiagFltRips[["diagram"]]) FUNvalues <- X[, 1] + X[, 2] FltFun <- funFiltration(FUNvalues = FUNvalues, cmplx = FltRips[["cmplx"]]) DiagFltFun <- filtrationDiag(filtration = FltFun, maxdimension = maxdimension, library = "Dionysus", location = TRUE, printProgress = TRUE) plot(DiagFltFun[["diagram"]], diagLim = c(-2, 5))
n <- 5 X <- cbind(cos(2*pi*seq_len(n)/n), sin(2*pi*seq_len(n)/n)) maxdimension <- 1 maxscale <- 1.5 dist <- "euclidean" library <- "Dionysus" FltRips <- ripsFiltration(X = X, maxdimension = maxdimension, maxscale = maxscale, dist = "euclidean", library = "Dionysus", printProgress = TRUE) DiagFltRips <- filtrationDiag(filtration = FltRips, maxdimension = maxdimension, library = "Dionysus", location = TRUE, printProgress = TRUE) plot(DiagFltRips[["diagram"]]) FUNvalues <- X[, 1] + X[, 2] FltFun <- funFiltration(FUNvalues = FUNvalues, cmplx = FltRips[["cmplx"]]) DiagFltFun <- filtrationDiag(filtration = FltFun, maxdimension = maxdimension, library = "Dionysus", location = TRUE, printProgress = TRUE) plot(DiagFltFun[["diagram"]], diagLim = c(-2, 5))
The function funFiltration
computes the filtration from the complex and the function values.
funFiltration(FUNvalues, cmplx, sublevel = TRUE)
funFiltration(FUNvalues, cmplx, sublevel = TRUE)
FUNvalues |
The function values on the vertices of the complex. |
cmplx |
the complex. |
sublevel |
a logical variable indicating if the Persistence Diagram should be computed for sublevel sets ( |
See references.
The function funFiltration
returns a list with the following elements:
cmplx |
a list representing the complex. Its i-th element represents the vertices of i-th simplex. |
values |
a vector representing the filtration values. Its i-th element represents the filtration value of i-th simplex. |
increasing |
a logical variable indicating if the filtration values are in increasing order ( |
Jisu Kim
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
n <- 5 X <- cbind(cos(2*pi*seq_len(n)/n), sin(2*pi*seq_len(n)/n)) maxdimension <- 1 maxscale <- 1.5 dist <- "euclidean" library <- "Dionysus" FltRips <- ripsFiltration(X = X, maxdimension = maxdimension, maxscale = maxscale, dist = "euclidean", library = "Dionysus", printProgress = TRUE) FUNvalues <- X[, 1] + X[, 2] FltFun <- funFiltration(FUNvalues = FUNvalues, cmplx = FltRips[["cmplx"]])
n <- 5 X <- cbind(cos(2*pi*seq_len(n)/n), sin(2*pi*seq_len(n)/n)) maxdimension <- 1 maxscale <- 1.5 dist <- "euclidean" library <- "Dionysus" FltRips <- ripsFiltration(X = X, maxdimension = maxdimension, maxscale = maxscale, dist = "euclidean", library = "Dionysus", printProgress = TRUE) FUNvalues <- X[, 1] + X[, 2] FltFun <- funFiltration(FUNvalues = FUNvalues, cmplx = FltRips[["cmplx"]])
The function gridDiag
computes the Persistence Diagram of a filtration of sublevel sets (or superlevel sets) of a function evaluated over a grid of points in arbitrary dimension d
.
gridDiag( X = NULL, FUN = NULL, lim = NULL, by = NULL, FUNvalues = NULL, maxdimension = max(NCOL(X), length(dim(FUNvalues))) - 1, sublevel = TRUE, library = "GUDHI", location = FALSE, printProgress = FALSE, diagLimit = NULL, ...)
gridDiag( X = NULL, FUN = NULL, lim = NULL, by = NULL, FUNvalues = NULL, maxdimension = max(NCOL(X), length(dim(FUNvalues))) - 1, sublevel = TRUE, library = "GUDHI", location = FALSE, printProgress = FALSE, diagLimit = NULL, ...)
X |
an |
FUN |
a function whose inputs are 1) an |
lim |
a |
by |
either a number or a vector of length |
FUNvalues |
an |
maxdimension |
a number that indicates the maximum dimension of the homological features to compute: |
sublevel |
a logical variable indicating if the Persistence Diagram should be computed for sublevel sets ( |
library |
a string specifying which library to compute the persistence diagram. The user can choose either the library |
location |
if |
printProgress |
if |
diagLimit |
a number that replaces |
... |
additional parameters for the function |
If the values of X
, FUN
are set, then FUNvalues
should be NULL
. In this case, gridDiag
evaluates the function FUN
over a grid. If the value of FUNvalues
is set, then X
, FUN
should be NULL
. In this case, FUNvalues
is used as function values over the grid. If location=TRUE
, then lim
, and by
should be set.
Once function values are either computed or given, gridDiag
constructs a filtration by triangulating the grid and considering the simplices determined by the values of the function of dimension up to maxdimension+1
.
The function gridDiag
returns a list with the following components:
diagram |
an object of class |
birthLocation |
only if |
deathLocation |
only if |
cycleLocation |
only if |
The user can decide to use either the C++ library GUDHI, Dionysus, or PHAT. See references.
Since dimension of simplicial complex from grid points in is up to
, homology of dimension
is trivial. Hence setting
maxdimension
with values is equivalent to
maxdimension=d-1
.
Brittany T. Fasy, Jisu Kim, and Fabrizio Lecci
Fasy B, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology." (arXiv:1303.7117). Annals of Statistics.
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology." https://www.mrzv.org/software/dionysus/
Bauer U, Kerber M, Reininghaus J (2012). "PHAT, a software library for persistent homology." https://bitbucket.org/phat-code/phat/
summary.diagram
, plot.diagram
,
distFct
, kde
, kernelDist
, dtm
,
alphaComplexDiag
, alphaComplexDiag
, ripsDiag
## Distance Function Diagram and Kernel Density Diagram # input data n <- 300 XX <- circleUnif(n) ## Ranges of the grid Xlim <- c(-1.8, 1.8) Ylim <- c(-1.6, 1.6) lim <- cbind(Xlim, Ylim) by <- 0.05 h <- .3 #bandwidth for the function kde #Distance Function Diagram of the sublevel sets Diag1 <- gridDiag(XX, distFct, lim = lim, by = by, sublevel = TRUE, printProgress = TRUE) #Kernel Density Diagram of the superlevel sets Diag2 <- gridDiag(XX, kde, lim = lim, by = by, sublevel = FALSE, library = "Dionysus", location = TRUE, printProgress = TRUE, h = h) #plot par(mfrow = c(2, 2)) plot(XX, cex = 0.5, pch = 19) title(main = "Data") plot(Diag1[["diagram"]]) title(main = "Distance Function Diagram") plot(Diag2[["diagram"]]) title(main = "Density Persistence Diagram") one <- which(Diag2[["diagram"]][, 1] == 1) plot(XX, col = 2, main = "Representative loop of grid points") for (i in seq(along = one)) { points(Diag2[["birthLocation"]][one[i], , drop = FALSE], pch = 15, cex = 3, col = i) points(Diag2[["deathLocation"]][one[i], , drop = FALSE], pch = 17, cex = 3, col = i) for (j in seq_len(dim(Diag2[["cycleLocation"]][[one[i]]])[1])) { lines(Diag2[["cycleLocation"]][[one[i]]][j, , ], pch = 19, cex = 1, col = i) } }
## Distance Function Diagram and Kernel Density Diagram # input data n <- 300 XX <- circleUnif(n) ## Ranges of the grid Xlim <- c(-1.8, 1.8) Ylim <- c(-1.6, 1.6) lim <- cbind(Xlim, Ylim) by <- 0.05 h <- .3 #bandwidth for the function kde #Distance Function Diagram of the sublevel sets Diag1 <- gridDiag(XX, distFct, lim = lim, by = by, sublevel = TRUE, printProgress = TRUE) #Kernel Density Diagram of the superlevel sets Diag2 <- gridDiag(XX, kde, lim = lim, by = by, sublevel = FALSE, library = "Dionysus", location = TRUE, printProgress = TRUE, h = h) #plot par(mfrow = c(2, 2)) plot(XX, cex = 0.5, pch = 19) title(main = "Data") plot(Diag1[["diagram"]]) title(main = "Distance Function Diagram") plot(Diag2[["diagram"]]) title(main = "Density Persistence Diagram") one <- which(Diag2[["diagram"]][, 1] == 1) plot(XX, col = 2, main = "Representative loop of grid points") for (i in seq(along = one)) { points(Diag2[["birthLocation"]][one[i], , drop = FALSE], pch = 15, cex = 3, col = i) points(Diag2[["deathLocation"]][one[i], , drop = FALSE], pch = 17, cex = 3, col = i) for (j in seq_len(dim(Diag2[["cycleLocation"]][[one[i]]])[1])) { lines(Diag2[["cycleLocation"]][[one[i]]][j, , ], pch = 19, cex = 1, col = i) } }
The function gridFiltration
computes the Persistence Diagram of a filtration of sublevel sets (or superlevel sets) of a function evaluated over a grid of points in arbitrary dimension d
.
gridFiltration( X = NULL, FUN = NULL, lim = NULL, by = NULL, FUNvalues = NULL, maxdimension = max(NCOL(X), length(dim(FUNvalues))) - 1, sublevel = TRUE, printProgress = FALSE, ...)
gridFiltration( X = NULL, FUN = NULL, lim = NULL, by = NULL, FUNvalues = NULL, maxdimension = max(NCOL(X), length(dim(FUNvalues))) - 1, sublevel = TRUE, printProgress = FALSE, ...)
X |
an |
FUN |
a function whose inputs are 1) an |
lim |
a |
by |
either a number or a vector of length |
FUNvalues |
an |
maxdimension |
a number that indicates the maximum dimension of the homological features to compute: |
sublevel |
a logical variable indicating if the Persistence Diagram should be computed for sublevel sets ( |
printProgress |
if |
... |
additional parameters for the function |
If the values of X
, FUN
are set, then FUNvalues
should be NULL
. In this case, gridFiltration
evaluates the function FUN
over a grid. If the value of FUNvalues
is set, then X
, FUN
should be NULL
. In this case, FUNvalues
is used as function values over the grid.
Once function values are either computed or given, gridFiltration
constructs a filtration by triangulating the grid and considering the simplices determined by the values of the function of dimension up to maxdimension+1
.
The function gridFiltration
returns a list with the following elements:
cmplx |
a list representing the complex. Its i-th element represents the vertices of i-th simplex. |
values |
a vector representing the filtration values. Its i-th element represents the filtration value of i-th simplex. |
increasing |
a logical variable indicating if the filtration values are in increasing order ( |
coordinates |
only if both |
The user can decide to use either the C++ library GUDHI, Dionysus, or PHAT. See references.
Since dimension of simplicial complex from grid points in is up to
, homology of dimension
is trivial. Hence setting
maxdimension
with values is equivalent to
maxdimension=d-1
.
Brittany T. Fasy, Jisu Kim, and Fabrizio Lecci
Fasy B, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology." (arXiv:1303.7117). Annals of Statistics.
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology." https://www.mrzv.org/software/dionysus/
Bauer U, Kerber M, Reininghaus J (2012). "PHAT, a software library for persistent homology." https://bitbucket.org/phat-code/phat/
summary.diagram
, plot.diagram
,
distFct
, kde
, kernelDist
, dtm
,
alphaComplexDiag
, alphaComplexDiag
, ripsDiag
# input data n <- 10 XX <- circleUnif(n) ## Ranges of the grid Xlim <- c(-1, 1) Ylim <- c(-1, 1) lim <- cbind(Xlim, Ylim) by <- 1 #Distance Function Diagram of the sublevel sets FltGrid <- gridFiltration( XX, distFct, lim = lim, by = by, sublevel = TRUE, printProgress = TRUE)
# input data n <- 10 XX <- circleUnif(n) ## Ranges of the grid Xlim <- c(-1, 1) Ylim <- c(-1, 1) lim <- cbind(Xlim, Ylim) by <- 1 #Distance Function Diagram of the sublevel sets FltGrid <- gridFiltration( XX, distFct, lim = lim, by = by, sublevel = TRUE, printProgress = TRUE)
hausdInterval
computes a confidence interval for the Hausdorff distance between a point cloud X
and the underlying manifold from which X
was sampled. See Details and References.
hausdInterval( X, m, B = 30, alpha = 0.05, parallel = FALSE, printProgress = FALSE)
hausdInterval( X, m, B = 30, alpha = 0.05, parallel = FALSE, printProgress = FALSE)
X |
an |
m |
the size of the subsamples. |
B |
the number of subsampling iterations. The default value is |
alpha |
|
parallel |
logical: if |
printProgress |
if |
For B
times, the subsampling algorithm subsamples m
points of X
(without replacement) and computes the Hausdorff distance between the original sample X
and the subsample. The result is a sequence of B
values. Let be the (
1-alpha
) quantile of these values and let . The interval
is a valid (
1-alpha
) confidence interval for the Hausdorff distance between X
and the underlying manifold, as proven in (Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh, 2013, Theorem 3).
The function hausdInterval
returns a number . The confidence interval is
.
Fabrizio Lecci
Fasy BT, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology: Confidence Sets for Persistence Diagrams." (arXiv:1303.7117). Annals of Statistics.
X <- circleUnif(1000) interval <- hausdInterval(X, m = 800) print(interval)
X <- circleUnif(1000) interval <- hausdInterval(X, m = 800) print(interval)
Given a point cloud X
( points), the function
kde
computes the Kernel Density Estimator over a grid of points. The kernel is a Gaussian Kernel with smoothing parameter h
. For each , the Kernel Density estimator is defined as
kde(X, Grid, h, kertype = "Gaussian", weight = 1, printProgress = FALSE)
kde(X, Grid, h, kertype = "Gaussian", weight = 1, printProgress = FALSE)
X |
an |
Grid |
an |
h |
number: the smoothing paramter of the Gaussian Kernel. |
kertype |
string: if |
weight |
either a number, or a vector of length |
printProgress |
if |
The function kde
returns a vector of length (the number of points in the grid) containing the value of the kernel density estimator for each point in the grid.
Jisu Kim and Fabrizio Lecci
Larry Wasserman (2004), "All of statistics: a concise course in statistical inference", Springer.
Brittany T. Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, and Aarti Singh. (2013), "Statistical Inference For Persistent Homology: Confidence Sets for Persistence Diagrams", (arXiv:1303.7117). To appear, Annals of Statistics.
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the function by <- 0.065 Xseq <- seq(-1.6, 1.6, by=by) Yseq <- seq(-1.7, 1.7, by=by) Grid <- expand.grid(Xseq,Yseq) ## kernel density estimator h <- 0.3 KDE <- kde(X, Grid, h)
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the function by <- 0.065 Xseq <- seq(-1.6, 1.6, by=by) Yseq <- seq(-1.7, 1.7, by=by) Grid <- expand.grid(Xseq,Yseq) ## kernel density estimator h <- 0.3 KDE <- kde(X, Grid, h)
Given a point cloud X
, the function kernelDist
computes the kernel distance over a grid of points. The kernel is a Gaussian Kernel with smoothing parameter h:
For each , the Kernel distance is defined by
kernelDist(X, Grid, h, weight = 1, printProgress = FALSE)
kernelDist(X, Grid, h, weight = 1, printProgress = FALSE)
X |
an |
Grid |
an |
h |
number: the smoothing paramter of the Gaussian Kernel. |
weight |
either a number, or a vector of length |
printProgress |
if |
The function kernelDist
returns a vector of lenght (the number of points in the grid) containing the value of the Kernel distance for each point in the grid.
Jisu Kim and Fabrizio Lecci
Phillips JM, Wang B, Zheng Y (2013). "Geometric Inference on Kernel Density Estimates." arXiv:1307.7760.
Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the functions by <- 0.065 Xseq <- seq(-1.6, 1.6, by = by) Yseq <- seq(-1.7, 1.7, by = by) Grid <- expand.grid(Xseq, Yseq) ## kernel distance estimator h <- 0.3 Kdist <- kernelDist(X, Grid, h)
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the functions by <- 0.065 Xseq <- seq(-1.6, 1.6, by = by) Yseq <- seq(-1.7, 1.7, by = by) Grid <- expand.grid(Xseq, Yseq) ## kernel distance estimator h <- 0.3 Kdist <- kernelDist(X, Grid, h)
Given a point cloud X
( points), The function
knnDE
computes the k Nearest Neighbors Density Estimator over a grid of points. For each , the knn Density Estimator is defined by
where is the volume of the Euclidean
dimensional unit ball and
is the Euclidean distance from point x to its
'th closest neighbor.
knnDE(X, Grid, k)
knnDE(X, Grid, k)
X |
an |
Grid |
an |
k |
number: the smoothing paramter of the k Nearest Neighbors Density Estimator. |
The function knnDE
returns a vector of length (the number of points in the grid) containing the value of the knn Density Estimator for each point in the grid.
Fabrizio Lecci
kde
, kernelDist
, distFct
, dtm
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the function by <- 0.065 Xseq <- seq(-1.6, 1.6, by = by) Yseq <- seq(-1.7, 1.7, by = by) Grid <- expand.grid(Xseq, Yseq) ## kernel density estimator k <- 50 KNN <- knnDE(X, Grid, k)
## Generate Data from the unit circle n <- 300 X <- circleUnif(n) ## Construct a grid of points over which we evaluate the function by <- 0.065 Xseq <- seq(-1.6, 1.6, by = by) Yseq <- seq(-1.7, 1.7, by = by) Grid <- expand.grid(Xseq, Yseq) ## kernel density estimator k <- 50 KNN <- knnDE(X, Grid, k)
The function landscape
computes the landscape function corresponding to a given persistence diagram.
landscape( Diag, dimension = 1, KK = 1, tseq = seq(min(Diag[,2:3]), max(Diag[,2:3]), length=500))
landscape( Diag, dimension = 1, KK = 1, tseq = seq(min(Diag[,2:3]), max(Diag[,2:3]), length=500))
Diag |
an object of class |
dimension |
the dimension of the topological features under consideration. The default value is |
KK |
a vector: the order of the landscape function. The default value is |
tseq |
a vector of values at which the landscape function is evaluated. |
The function landscape
returns a numeric matrix with the number of row as the length of tseq
and the number of column as the length of KK
. The value at ith row and jth column represents the value of the KK[j]
-th landscape function evaluated at tseq[i]
.
Fabrizio Lecci
Bubenik P (2012). "Statistical topology using persistence landscapes." arXiv:1207.6437.
Chazal F, Fasy BT, Lecci F, Rinaldo A, Wasserman L (2014). "Stochastic Convergence of Persistence Landscapes and Silhouettes." Proceedings of the 30th Symposium of Computational Geometry (SoCG). (arXiv:1312.0308)
Diag <- matrix(c(0, 0, 10, 1, 0, 3, 1, 3, 8), ncol = 3, byrow = TRUE) DiagLim <- 10 colnames(Diag) <- c("dimension", "Birth", "Death") #persistence landscape tseq <- seq(0,DiagLim, length = 1000) Land <- landscape(Diag, dimension = 1, KK = 1, tseq) par(mfrow = c(1,2)) plot.diagram(Diag) plot(tseq, Land, type = "l", xlab = "t", ylab = "landscape", asp = 1)
Diag <- matrix(c(0, 0, 10, 1, 0, 3, 1, 3, 8), ncol = 3, byrow = TRUE) DiagLim <- 10 colnames(Diag) <- c("dimension", "Birth", "Death") #persistence landscape tseq <- seq(0,DiagLim, length = 1000) Land <- landscape(Diag, dimension = 1, KK = 1, tseq) par(mfrow = c(1,2)) plot.diagram(Diag) plot(tseq, Land, type = "l", xlab = "t", ylab = "landscape", asp = 1)
Given a point cloud and a function built on top of the data, we are interested in studying the evolution of the sublevel sets (or superlevel sets) of the function, using persistent homology. The Maximal Persistence Method selects the optimal smoothing parameter of the function, by maximizing the number of significant topological features, or by maximizing the total significant persistence of the features. For each value of the smoothing parameter, the function maxPersistence
computes a persistence diagram using gridDiag
and returns the values of the two criteria, the dimension of detected features, their persistence, and a bootstrapped confidence band. The features that fall outside of the band are statistically significant. See References.
maxPersistence( FUN, parameters, X, lim, by, maxdimension = length(lim) / 2 - 1, sublevel = TRUE, library = "GUDHI", B = 30, alpha = 0.05, bandFUN = "bootstrapBand", distance = "bottleneck", dimension = min(1, maxdimension), p = 1, parallel = FALSE, printProgress = FALSE, weight = NULL)
maxPersistence( FUN, parameters, X, lim, by, maxdimension = length(lim) / 2 - 1, sublevel = TRUE, library = "GUDHI", B = 30, alpha = 0.05, bandFUN = "bootstrapBand", distance = "bottleneck", dimension = min(1, maxdimension), p = 1, parallel = FALSE, printProgress = FALSE, weight = NULL)
FUN |
the name of a function whose inputs are: 1) |
parameters |
a numerical vector, storing a sequence of values for the smoothing paramter of |
X |
a |
lim |
a |
by |
either a number or a vector of length |
maxdimension |
a number that indicates the maximum dimension to compute persistent homology to. The default value is |
sublevel |
a logical variable indicating if the persistent homology should be computed for sublevel sets of |
library |
a string specifying which library to compute the persistence diagram. The user can choose either the library |
bandFUN |
the function to be used in the computation of the confidence band. Either |
B |
the number of bootstrap iterations. |
alpha |
for each value store in |
distance |
optional (if bandFUN == bootstrapDiagram): a string specifying the distance to be used for persistence diagrams: either |
dimension |
optional (if bandFUN == bootstrapDiagram): an integer or a vector specifying the dimension of the features used to compute the bottleneck distance. 0 for connected components, 1 for loops, 2 for voids. The default value is |
p |
optional (if bandFUN == bootstrapDiagram AND distance == "wasserstein"): integer specifying the power to be used in the computation of the Wasserstein distance. The default value is |
parallel |
logical: if |
printProgress |
if |
weight |
either NULL, a number, or a vector of length |
The function maxPersistence
calls the gridDiag
function, which computes the persistence diagram of sublevel (or superlevel) sets of a function, evaluated over a grid of points.
The function maxPersistence
returns an object of the class "maxPersistence", a list with the following components
parameters |
the same vector |
sigNumber |
a numeric vector storing the number of significant features in the persistence diagrams computed using each value in |
sigPersistence |
a numeric vector storing the sum of significant persistence of the features in the persistence diagrams, computed using each value in |
bands |
a numeric vector storing the bootstrap band's width, for each value in |
Persistence |
a list of the same lenght of |
Jisu Kim and Fabrizio Lecci
Chazal F, Cisewski J, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: distance-to-a-measure and kernel distance."
Fasy BT, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology", (arXiv:1303.7117). Annals of Statistics.
gridDiag
, kde
, kernelDist
, dtm
, bootstrapBand
## input data: circle with clutter noise n <- 600 percNoise <- 0.1 XX1 <- circleUnif(n) noise <- cbind(runif(percNoise * n, -2, 2), runif(percNoise * n, -2, 2)) X <- rbind(XX1, noise) ## limits of the Gird at which the density estimator is evaluated Xlim <- c(-2, 2) Ylim <- c(-2, 2) lim <- cbind(Xlim, Ylim) by <- 0.2 B <- 80 alpha <- 0.05 ## candidates parametersKDE <- seq(0.1, 0.5, by = 0.2) maxKDE <- maxPersistence(kde, parametersKDE, X, lim = lim, by = by, bandFUN = "bootstrapBand", B = B, alpha = alpha, parallel = FALSE, printProgress = TRUE) print(summary(maxKDE)) par(mfrow = c(1,2)) plot(X, pch = 16, cex = 0.5, main = "Circle") plot(maxKDE)
## input data: circle with clutter noise n <- 600 percNoise <- 0.1 XX1 <- circleUnif(n) noise <- cbind(runif(percNoise * n, -2, 2), runif(percNoise * n, -2, 2)) X <- rbind(XX1, noise) ## limits of the Gird at which the density estimator is evaluated Xlim <- c(-2, 2) Ylim <- c(-2, 2) lim <- cbind(Xlim, Ylim) by <- 0.2 B <- 80 alpha <- 0.05 ## candidates parametersKDE <- seq(0.1, 0.5, by = 0.2) maxKDE <- maxPersistence(kde, parametersKDE, X, lim = lim, by = by, bandFUN = "bootstrapBand", B = B, alpha = alpha, parallel = FALSE, printProgress = TRUE) print(summary(maxKDE)) par(mfrow = c(1,2)) plot(X, pch = 16, cex = 0.5, main = "Circle") plot(maxKDE)
The function multipBootstrap
computes a confidence band for the average landscape (or the average silhouette) using the multiplier bootstrap.
multipBootstrap( Y, B = 30, alpha = 0.05, parallel = FALSE, printProgress = FALSE)
multipBootstrap( Y, B = 30, alpha = 0.05, parallel = FALSE, printProgress = FALSE)
Y |
an |
B |
the number of bootstrap iterations. |
alpha |
|
parallel |
logical: if |
printProgress |
logical: if |
See Algorithm 1 in the reference.
The function multipBootstrap
returns a list with the following elements:
width |
number: half of the width of the unfiorm confidence band; that is, the distance of the upper and lower limits of the band from the empirical average landscape (or silhouette). |
mean |
a numeric vector of length |
band |
an |
Fabrizio Lecci
Chazal F, Fasy BT, Lecci F, Rinaldo A, Wasserman L (2014). "Stochastic Convergence of Persistence Landscapes and Silhouettes." Proceedings of the 30th Symposium of Computational Geometry (SoCG). (arXiv:1312.0308)
nn <- 3000 #large sample size mm <- 50 #small subsample size NN <- 5 #we will compute NN diagrams using subsamples of size mm XX <- circleUnif(nn) ## large sample from the unit circle DiagLim <- 2 maxdimension <- 1 tseq <- seq(0, DiagLim, length = 1000) Diags <- list() #here we will store the NN rips diagrams #constructed using different subsamples of mm points #here we'll store the landscapes Lands <- matrix(0, nrow = NN, ncol = length(tseq)) for (i in seq_len(NN)){ subXX <- XX[sample(seq_len(nn), mm), ] Diags[[i]] <- ripsDiag(subXX, maxdimension, DiagLim) Lands[i, ] <- landscape(Diags[[i]][["diagram"]], dimension = 1, KK = 1, tseq) } ## now we use the NN landscapes to construct a confidence band B <- 50 alpha <- 0.05 boot <- multipBootstrap(Lands, B, alpha) LOWband <- boot[["band"]][, 1] UPband <- boot[["band"]][, 2] MeanLand <- boot[["mean"]] plot(tseq, MeanLand, type = "l", lwd = 2, xlab = "", ylab = "", main = "Mean Landscape with band", ylim = c(0, 1.2)) polygon(c(tseq, rev(tseq)), c(LOWband, rev(UPband)), col = "pink") lines(tseq, MeanLand, lwd = 1, col = 2)
nn <- 3000 #large sample size mm <- 50 #small subsample size NN <- 5 #we will compute NN diagrams using subsamples of size mm XX <- circleUnif(nn) ## large sample from the unit circle DiagLim <- 2 maxdimension <- 1 tseq <- seq(0, DiagLim, length = 1000) Diags <- list() #here we will store the NN rips diagrams #constructed using different subsamples of mm points #here we'll store the landscapes Lands <- matrix(0, nrow = NN, ncol = length(tseq)) for (i in seq_len(NN)){ subXX <- XX[sample(seq_len(nn), mm), ] Diags[[i]] <- ripsDiag(subXX, maxdimension, DiagLim) Lands[i, ] <- landscape(Diags[[i]][["diagram"]], dimension = 1, KK = 1, tseq) } ## now we use the NN landscapes to construct a confidence band B <- 50 alpha <- 0.05 boot <- multipBootstrap(Lands, B, alpha) LOWband <- boot[["band"]][, 1] UPband <- boot[["band"]][, 2] MeanLand <- boot[["mean"]] plot(tseq, MeanLand, type = "l", lwd = 2, xlab = "", ylab = "", main = "Mean Landscape with band", ylim = c(0, 1.2)) polygon(c(tseq, rev(tseq)), c(LOWband, rev(UPband)), col = "pink") lines(tseq, MeanLand, lwd = 1, col = 2)
The function plot.clusterTree
plots the Cluster Tree stored in an object of class clusterTree
.
## S3 method for class 'clusterTree' plot( x, type = "lambda", color = NULL, add = FALSE, ...)
## S3 method for class 'clusterTree' plot( x, type = "lambda", color = NULL, add = FALSE, ...)
x |
an object of class |
type |
string: if "lambda", then the lambda Tree is plotted. if "r", then the r Tree is plotted. if "alpha", then the alpha Tree is plotted. if "kappa", then the kappa Tree is plotted. |
color |
number: the color of the branches of the Cluster Tree. The default value is |
add |
logical: if |
... |
additional graphical parameters. |
Fabrizio Lecci
Kent BP, Rinaldo A, Verstynen T (2013). "DeBaCl: A Python Package for Interactive DEnsity-BAsed CLustering." arXiv:1307.8136
Lecci F, Rinaldo A, Wasserman L (2014). "Metric Embeddings for Cluster Trees"
clusterTree
, print.clusterTree
## Generate data: 3 clusters n <- 1200 #sample size Neach <- floor(n / 4) X1 <- cbind(rnorm(Neach, 1, .8), rnorm(Neach, 5, 0.8)) X2 <- cbind(rnorm(Neach, 3.5, .8), rnorm(Neach, 5, 0.8)) X3 <- cbind(rnorm(Neach, 6, 1), rnorm(Neach, 1, 1)) XX <- rbind(X1, X2, X3) k <- 100 #parameter of knn ## Density clustering using knn and kde Tree <- clusterTree(XX, k, density = "knn") TreeKDE <- clusterTree(XX,k, h = 0.3, density = "kde") par(mfrow = c(2, 3)) plot(XX, pch = 19, cex = 0.6) # plot lambda trees plot(Tree, type = "lambda", main = "lambda Tree (knn)") plot(TreeKDE, type = "lambda", main = "lambda Tree (kde)") # plot clusters plot(XX, pch = 19, cex = 0.6, main = "cluster labels") for (i in Tree[["id"]]){ points(matrix(XX[Tree[["DataPoints"]][[i]], ], ncol = 2), col = i, pch = 19, cex = 0.6) } #plot kappa trees plot(Tree, type = "kappa", main = "kappa Tree (knn)") plot(TreeKDE, type = "kappa", main = "kappa Tree (kde)")
## Generate data: 3 clusters n <- 1200 #sample size Neach <- floor(n / 4) X1 <- cbind(rnorm(Neach, 1, .8), rnorm(Neach, 5, 0.8)) X2 <- cbind(rnorm(Neach, 3.5, .8), rnorm(Neach, 5, 0.8)) X3 <- cbind(rnorm(Neach, 6, 1), rnorm(Neach, 1, 1)) XX <- rbind(X1, X2, X3) k <- 100 #parameter of knn ## Density clustering using knn and kde Tree <- clusterTree(XX, k, density = "knn") TreeKDE <- clusterTree(XX,k, h = 0.3, density = "kde") par(mfrow = c(2, 3)) plot(XX, pch = 19, cex = 0.6) # plot lambda trees plot(Tree, type = "lambda", main = "lambda Tree (knn)") plot(TreeKDE, type = "lambda", main = "lambda Tree (kde)") # plot clusters plot(XX, pch = 19, cex = 0.6, main = "cluster labels") for (i in Tree[["id"]]){ points(matrix(XX[Tree[["DataPoints"]][[i]], ], ncol = 2), col = i, pch = 19, cex = 0.6) } #plot kappa trees plot(Tree, type = "kappa", main = "kappa Tree (knn)") plot(TreeKDE, type = "kappa", main = "kappa Tree (kde)")
The function plot.diagram
plots the Persistence Diagram stored in an object of class diagram
. Optionally, it can also represent the diagram as a persistence barcode.
## S3 method for class 'diagram' plot( x, diagLim = NULL, dimension = NULL, col = NULL, rotated = FALSE, barcode = FALSE, band = NULL, lab.line = 2.2, colorBand = "pink", colorBorder = NA, add = FALSE, ...)
## S3 method for class 'diagram' plot( x, diagLim = NULL, dimension = NULL, col = NULL, rotated = FALSE, barcode = FALSE, band = NULL, lab.line = 2.2, colorBand = "pink", colorBorder = NA, add = FALSE, ...)
x |
an object of class |
diagLim |
numeric vector of length 2, specifying the limits of the plot. If |
dimension |
number specifying the dimension of the features to be plotted. If |
col |
an optional vector of length |
rotated |
logical: if |
barcode |
logical: if |
band |
numeric: if |
lab.line |
number of lines from the plot edge, where the labels will be placed. The default value is |
colorBand |
the color for filling the confidence band. The default value is |
colorBorder |
the color to draw the border of the confidence band. The default value is |
add |
logical: if |
... |
additional graphical parameters. |
Fabrizio Lecci
Brittany T. Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, and Aarti Singh. (2013), "Statistical Inference For Persistent Homology", (arXiv:1303.7117). To appear, Annals of Statistics.
Frederic Chazal, Brittany T. Fasy, Fabrizio Lecci, Alessandro Rinaldo, and Larry Wasserman, (2014), "Stochastic Convergence of Persistence Landscapes and Silhouettes", Proceedings of the 30th Symposium of Computational Geometry (SoCG). (arXiv:1312.0308)
alphaComplexDiag
, alphaComplexDiag
, gridDiag
, ripsDiag
XX1 <- circleUnif(30) XX2 <- circleUnif(30, r = 2) + 3 XX <- rbind(XX1, XX2) DiagLim <- 5 maxdimension <- 1 ## rips diagram Diag <- ripsDiag(XX, maxdimension, DiagLim, printProgress = TRUE) #plot par(mfrow = c(1, 3)) plot(Diag[["diagram"]]) plot(Diag[["diagram"]], rotated = TRUE) plot(Diag[["diagram"]], barcode = TRUE)
XX1 <- circleUnif(30) XX2 <- circleUnif(30, r = 2) + 3 XX <- rbind(XX1, XX2) DiagLim <- 5 maxdimension <- 1 ## rips diagram Diag <- ripsDiag(XX, maxdimension, DiagLim, printProgress = TRUE) #plot par(mfrow = c(1, 3)) plot(Diag[["diagram"]]) plot(Diag[["diagram"]], rotated = TRUE) plot(Diag[["diagram"]], barcode = TRUE)
The function plot.maxPersistence
plots an object of class maxPersistence
, for the selection of the optimal smoothing parameter for persistent homology.
For each value of the smoothing parameter, the plot shows the number of detected features, their persistence, and a bootstrap confidence band.
## S3 method for class 'maxPersistence' plot( x, features = "dimension", colorBand = "pink", colorBorder = NA, ...)
## S3 method for class 'maxPersistence' plot( x, features = "dimension", colorBand = "pink", colorBorder = NA, ...)
x |
an object of class |
features |
string: if "all" then all the features are plotted; if "dimension" then only the features of the dimension used to compute the confidence band are plotted. |
colorBand |
the color for filling the confidence band. The default is "pink". (NA leaves the band unfilled) |
colorBorder |
the color to draw the border of the confidence band. The default is NA and omits the border. |
... |
additional graphical parameters. |
Fabrizio Lecci
Chazal F, Cisewski J, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: distance-to-a-measure and kernel distance."
Fasy BT, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology." (arXiv:1303.7117). Annals of Statistics.
## input data: circle with clutter noise n <- 600 percNoise <- 0.1 XX1 <- circleUnif(n) noise <- cbind(runif(percNoise * n, -2, 2), runif(percNoise * n, -2, 2)) X <- rbind(XX1, noise) ## limits of the Gird at which the density estimator is evaluated Xlim <- c(-2, 2) Ylim <- c(-2, 2) lim <- cbind(Xlim, Ylim) by <- 0.2 B <- 80 alpha <- 0.05 ## candidates parametersKDE <- seq(0.1, 0.5, by = 0.2) maxKDE <- maxPersistence(kde, parametersKDE, X, lim = lim, by = by, bandFUN = "bootstrapBand", B = B, alpha = alpha, parallel = FALSE, printProgress = TRUE) print(summary(maxKDE)) par(mfrow = c(1, 2)) plot(X, pch = 16, cex = 0.5, main = "Circle") plot(maxKDE)
## input data: circle with clutter noise n <- 600 percNoise <- 0.1 XX1 <- circleUnif(n) noise <- cbind(runif(percNoise * n, -2, 2), runif(percNoise * n, -2, 2)) X <- rbind(XX1, noise) ## limits of the Gird at which the density estimator is evaluated Xlim <- c(-2, 2) Ylim <- c(-2, 2) lim <- cbind(Xlim, Ylim) by <- 0.2 B <- 80 alpha <- 0.05 ## candidates parametersKDE <- seq(0.1, 0.5, by = 0.2) maxKDE <- maxPersistence(kde, parametersKDE, X, lim = lim, by = by, bandFUN = "bootstrapBand", B = B, alpha = alpha, parallel = FALSE, printProgress = TRUE) print(summary(maxKDE)) par(mfrow = c(1, 2)) plot(X, pch = 16, cex = 0.5, main = "Circle") plot(maxKDE)
The function ripsDiag
computes the persistence diagram of the Rips filtration built on top of a point cloud.
ripsDiag( X, maxdimension, maxscale, dist = "euclidean", library = "GUDHI", location = FALSE, printProgress = FALSE)
ripsDiag( X, maxdimension, maxscale, dist = "euclidean", library = "GUDHI", location = FALSE, printProgress = FALSE)
X |
If |
maxdimension |
integer: max dimension of the homological features to be computed. (e.g. 0 for connected components, 1 for connected components and loops, 2 for connected components, loops, voids, etc.)
Currently there is a bug for computing homological features of dimension higher than 1 when the distance is arbitrary ( |
maxscale |
number: maximum value of the rips filtration. |
dist |
|
library |
either a string or a vector of length two. When a vector is given, the first element specifies which library to compute the Rips filtration, and the second element specifies which library to compute the persistence diagram. If a string is used, then the same library is used. For computing the Rips filtration, if |
location |
if |
printProgress |
logical: if |
For Rips filtration based on Euclidean distance of the input point cloud, the user can decide to use either the C++ library GUDHI or Dionysus.
For Rips filtration based on arbitrary distance, the user can decide to the C++ library Dionysus.
Then for computing the persistence diagram from the Rips filtration, the user can use either the C++ library GUDHI, Dionysus, or PHAT.
Currently there is a bug for computing homological features of dimension higher than 1 when the distance is arbitrary (dist = "arbitrary"
) and library 'GUDHI' is used (library = "GUDHI"
).
See refereneces.
The function ripsDiag
returns a list with the following elements:
diagram |
an object of class |
birthLocation |
only if |
deathLocation |
only if |
cycleLocation |
only if |
Brittany T. Fasy, Jisu Kim, Fabrizio Lecci, and Clement Maria
Maria C (2014). "GUDHI, Simplicial Complexes and Persistent Homology Packages." https://project.inria.fr/gudhi/software/.
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology". https://www.mrzv.org/software/dionysus/
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
Fasy B, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology." (arXiv:1303.7117). Annals of Statistics.
summary.diagram
, plot.diagram
, gridDiag
## EXAMPLE 1: rips diagram for circles (euclidean distance) X <- circleUnif(30) maxscale <- 5 maxdimension <- 1 ## note that the input X is a point cloud DiagRips <- ripsDiag( X = X, maxdimension = maxdimension, maxscale = maxscale, library = "Dionysus", location = TRUE, printProgress = TRUE) # plot layout(matrix(c(1, 3, 2, 2), 2, 2)) plot(X, cex = 0.5, pch = 19) title(main = "Data") plot(DiagRips[["diagram"]]) title(main = "rips Diagram") one <- which( DiagRips[["diagram"]][, 1] == 1 & DiagRips[["diagram"]][, 3] - DiagRips[["diagram"]][, 2] > 0.5) plot(X, col = 2, main = "Representative loop of data points") for (i in seq(along = one)) { for (j in seq_len(dim(DiagRips[["cycleLocation"]][[one[i]]])[1])) { lines( DiagRips[["cycleLocation"]][[one[i]]][j, , ], pch = 19, cex = 1, col = i) } } ## EXAMPLE 2: rips diagram with arbitrary distance ## distance matrix for triangle with edges of length: 1,2,4 distX <- matrix(c(0, 1, 2, 1, 0, 4, 2, 4, 0), ncol = 3) maxscale <- 5 maxdimension <- 1 ## note that the input distXX is a distance matrix DiagTri <- ripsDiag(distX, maxdimension, maxscale, dist = "arbitrary", printProgress = TRUE) #points with lifetime = 0 are not shown. e.g. the loop of the triangle. print(DiagTri[["diagram"]])
## EXAMPLE 1: rips diagram for circles (euclidean distance) X <- circleUnif(30) maxscale <- 5 maxdimension <- 1 ## note that the input X is a point cloud DiagRips <- ripsDiag( X = X, maxdimension = maxdimension, maxscale = maxscale, library = "Dionysus", location = TRUE, printProgress = TRUE) # plot layout(matrix(c(1, 3, 2, 2), 2, 2)) plot(X, cex = 0.5, pch = 19) title(main = "Data") plot(DiagRips[["diagram"]]) title(main = "rips Diagram") one <- which( DiagRips[["diagram"]][, 1] == 1 & DiagRips[["diagram"]][, 3] - DiagRips[["diagram"]][, 2] > 0.5) plot(X, col = 2, main = "Representative loop of data points") for (i in seq(along = one)) { for (j in seq_len(dim(DiagRips[["cycleLocation"]][[one[i]]])[1])) { lines( DiagRips[["cycleLocation"]][[one[i]]][j, , ], pch = 19, cex = 1, col = i) } } ## EXAMPLE 2: rips diagram with arbitrary distance ## distance matrix for triangle with edges of length: 1,2,4 distX <- matrix(c(0, 1, 2, 1, 0, 4, 2, 4, 0), ncol = 3) maxscale <- 5 maxdimension <- 1 ## note that the input distXX is a distance matrix DiagTri <- ripsDiag(distX, maxdimension, maxscale, dist = "arbitrary", printProgress = TRUE) #points with lifetime = 0 are not shown. e.g. the loop of the triangle. print(DiagTri[["diagram"]])
The function ripsFiltration
computes the Rips filtration built on top of a point cloud.
ripsFiltration( X, maxdimension, maxscale, dist = "euclidean", library = "GUDHI", printProgress = FALSE)
ripsFiltration( X, maxdimension, maxscale, dist = "euclidean", library = "GUDHI", printProgress = FALSE)
X |
If |
maxdimension |
integer: max dimension of the homological features to be computed. (e.g. 0 for connected components, 1 for connected components and loops, 2 for connected components, loops, voids, etc.) |
maxscale |
number: maximum value of the rips filtration. |
dist |
|
library |
a string specifying which library to compute the Rips filtration. If |
printProgress |
logical: if |
For Rips filtration based on Euclidean distance of the input point cloud, the user can decide to use either the C++ library GUDHI or Dionysus. For Rips filtration based on arbitrary distance, the user can use the C++ library Dionysus. See refereneces.
The function ripsFiltration
returns a list with the following elements:
cmplx |
a list representing the complex. Its i-th element represents the vertices of i-th simplex. |
values |
a vector representing the filtration values. Its i-th element represents the filtration value of i-th simplex. |
increasing |
a logical variable indicating if the filtration values are in increasing order ( |
coordinates |
only if |
Jisu Kim
Maria C (2014). "GUDHI, Simplicial Complexes and Persistent Homology Packages." https://project.inria.fr/gudhi/software/.
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology". https://www.mrzv.org/software/dionysus/
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
n <- 5 X <- cbind(cos(2*pi*seq_len(n)/n), sin(2*pi*seq_len(n)/n)) maxdimension <- 1 maxscale <- 1.5 FltRips <- ripsFiltration(X = X, maxdimension = maxdimension, maxscale = maxscale, dist = "euclidean", library = "GUDHI", printProgress = TRUE) # plot rips filtration lim <- rep(c(-1, 1), 2) plot(NULL, type = "n", xlim = lim[1:2], ylim = lim[3:4], main = "Rips Filtration Plot") for (idx in seq(along = FltRips[["cmplx"]])) { polygon(FltRips[["coordinates"]][FltRips[["cmplx"]][[idx]], , drop = FALSE], col = "pink", border = NA, xlim = lim[1:2], ylim = lim[3:4]) } for (idx in seq(along = FltRips[["cmplx"]])) { polygon(FltRips[["coordinates"]][FltRips[["cmplx"]][[idx]], , drop = FALSE], col = NULL, xlim = lim[1:2], ylim = lim[3:4]) } points(FltRips[["coordinates"]], pch = 16)
n <- 5 X <- cbind(cos(2*pi*seq_len(n)/n), sin(2*pi*seq_len(n)/n)) maxdimension <- 1 maxscale <- 1.5 FltRips <- ripsFiltration(X = X, maxdimension = maxdimension, maxscale = maxscale, dist = "euclidean", library = "GUDHI", printProgress = TRUE) # plot rips filtration lim <- rep(c(-1, 1), 2) plot(NULL, type = "n", xlim = lim[1:2], ylim = lim[3:4], main = "Rips Filtration Plot") for (idx in seq(along = FltRips[["cmplx"]])) { polygon(FltRips[["coordinates"]][FltRips[["cmplx"]][[idx]], , drop = FALSE], col = "pink", border = NA, xlim = lim[1:2], ylim = lim[3:4]) } for (idx in seq(along = FltRips[["cmplx"]])) { polygon(FltRips[["coordinates"]][FltRips[["cmplx"]][[idx]], , drop = FALSE], col = NULL, xlim = lim[1:2], ylim = lim[3:4]) } points(FltRips[["coordinates"]], pch = 16)
The function silhouette
computes the silhouette function corresponding to a given persistence diagram.
silhouette( Diag, p = 1, dimension = 1, tseq = seq(min(Diag[, 2:3]), max(Diag[, 2:3]), length = 500))
silhouette( Diag, p = 1, dimension = 1, tseq = seq(min(Diag[, 2:3]), max(Diag[, 2:3]), length = 500))
Diag |
an object of class |
p |
a vector: the power of the weights of the silhouette function. See the definition of silhouette function, Section 5 in the reference. |
dimension |
the dimension of the topological features under consideration. The default value is |
tseq |
a vector of values at which the silhouette function is evaluated. |
The function silhouette
returns a numeric matrix of with the number of row as the length of tseq
and the number of column as the length of p
. The value at ith row and jth column represents the value of the p[j]
-th power silhouette function evaluated at tseq[i]
.
Fabrizio Lecci
Chazal F, Fasy BT, Lecci F, Rinaldo A, Wasserman L (2014). "Stochastic Convergence of Persistence Landscapes and Silhouettes." Proceedings of the 30th Symposium of Computational Geometry (SoCG). (arXiv:1312.0308)
Diag <- matrix(c(0, 0, 10, 1, 0, 3, 1, 3, 8), ncol = 3, byrow = TRUE) DiagLim <- 10 colnames(Diag) <- c("dimension", "Birth", "Death") #persistence silhouette tseq <- seq(0, DiagLim, length = 1000) Sil <- silhouette(Diag, p = 1, dimension = 1, tseq) par(mfrow = c(1, 2)) plot.diagram(Diag) plot(tseq, Sil, type = "l", xlab = "t", ylab = "silhouette", asp = 1)
Diag <- matrix(c(0, 0, 10, 1, 0, 3, 1, 3, 8), ncol = 3, byrow = TRUE) DiagLim <- 10 colnames(Diag) <- c("dimension", "Birth", "Death") #persistence silhouette tseq <- seq(0, DiagLim, length = 1000) Sil <- silhouette(Diag, p = 1, dimension = 1, tseq) par(mfrow = c(1, 2)) plot.diagram(Diag) plot(tseq, Sil, type = "l", xlab = "t", ylab = "silhouette", asp = 1)
The function sphereUnif
samples n
points from the sphere of radius
r
embedded in , uniformly with respect to the volume measure of the sphere.
sphereUnif(n, d, r = 1)
sphereUnif(n, d, r = 1)
n |
an integer specifying the number of points in the sample. |
d |
an integer specifying the dimension of the sphere |
r |
a numeric variable specifying the radius of the sphere. The default value is |
The function sphereUnif
returns an n
by 2 matrix of coordinates.
When d = 1
, this function is same as using circleUnif
.
Jisu Kim
X <- sphereUnif(n = 100, d = 1, r = 1) plot(X)
X <- sphereUnif(n = 100, d = 1, r = 1) plot(X)
print
and summary
for diagram
The function print.diagram
prints a persistence diagram, a by 3 matrix, where
is the number of points in the diagram. The first column contains the dimension of each feature (0 for components, 1 for loops, 2 for voids, etc.). Second and third columns are Birth and Death of the features.
The function summary.diagram
produces basic summaries of a persistence diagrams.
## S3 method for class 'diagram' print(x, ...) ## S3 method for class 'diagram' summary(object, ...)
## S3 method for class 'diagram' print(x, ...) ## S3 method for class 'diagram' summary(object, ...)
x |
an object of class |
object |
an object of class |
... |
additional arguments affecting the summary produced. |
Fabrizio Lecci
plot.diagram
,
alphaComplexDiag
, alphaComplexDiag
, gridDiag
, ripsDiag
# Generate data from 2 circles XX1 <- circleUnif(30) XX2 <- circleUnif(30, r = 2) + 3 XX <- rbind(XX1, XX2) DiagLim <- 5 # limit of the filtration maxdimension <- 1 # computes betti0 and betti1 Diag <- ripsDiag(XX, maxdimension, DiagLim, printProgress = TRUE) print(Diag[["diagram"]]) print(summary(Diag[["diagram"]]))
# Generate data from 2 circles XX1 <- circleUnif(30) XX2 <- circleUnif(30, r = 2) + 3 XX <- rbind(XX1, XX2) DiagLim <- 5 # limit of the filtration maxdimension <- 1 # computes betti0 and betti1 Diag <- ripsDiag(XX, maxdimension, DiagLim, printProgress = TRUE) print(Diag[["diagram"]]) print(summary(Diag[["diagram"]]))
The function torusUnif
samples n
points from the 3D torus, uniformly with respect to its surface.
torusUnif(n, a, c)
torusUnif(n, a, c)
n |
an integer specifying the number of points in the sample. |
a |
the radius of the torus tube. |
c |
the radius from the center of the hole to the center of the torus tube. |
This function torusUnif
is an implementation of Algorithm 1 in the reference.
The function torusUnif
returns an n
by 3 matrix of coordinates.
Fabrizio Lecci
Diaconis P, Holmes S, and Shahshahani M (2013). "Sampling from a manifold." Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton. Institute of Mathematical Statistics, 102-125.
X <- torusUnif(300, a = 1.8, c = 5) plot(X)
X <- torusUnif(300, a = 1.8, c = 5) plot(X)
The function wasserstein
computes the Wasserstein distance between two persistence diagrams.
wasserstein(Diag1, Diag2, p = 1, dimension = 1)
wasserstein(Diag1, Diag2, p = 1, dimension = 1)
Diag1 |
an object of class |
Diag2 |
an object of class |
p |
integer specifying the power to be used in the computation of the Wasserstein distance. The default value is |
dimension |
an integer or a vector specifying the dimension of the features used to compute the wasserstein distance. 0 for connected components, 1 for loops, 2 for voids and so on. The default value is |
The Wasserstein distance between two diagrams is the cost of the optimal matching between points of the two diagrams. When a vector is given for dimension
, then maximum among bottleneck distances using each element in dimension
is returned. This function is an R wrapper of the function "wasserstein_distance" in the C++ library Dionysus. See references.
The function wasserstein
returns the value of the Wasserstein distance between the two persistence diagrams.
Jisu Kim and Fabrizio Lecci
Morozov D (2007). "Dionysus, a C++ library for computing persistent homology". https://www.mrzv.org/software/dionysus/.
Edelsbrunner H, Harer J (2010). "Computational topology: an introduction." American Mathematical Society.
bottleneck
,
alphaComplexDiag
, alphaComplexDiag
, gridDiag
, ripsDiag
,
plot.diagram
XX1 <- circleUnif(20) XX2 <- circleUnif(20, r = 0.2) DiagLim <- 5 maxdimension <- 1 Diag1 <- ripsDiag(XX1, maxdimension, DiagLim, printProgress = FALSE) Diag2 <- ripsDiag(XX2, maxdimension, DiagLim, printProgress = FALSE) wassersteinDist <- wasserstein(Diag1[["diagram"]], Diag2[["diagram"]], p = 1, dimension = 1) print(wassersteinDist)
XX1 <- circleUnif(20) XX2 <- circleUnif(20, r = 0.2) DiagLim <- 5 maxdimension <- 1 Diag1 <- ripsDiag(XX1, maxdimension, DiagLim, printProgress = FALSE) Diag2 <- ripsDiag(XX2, maxdimension, DiagLim, printProgress = FALSE) wassersteinDist <- wasserstein(Diag1[["diagram"]], Diag2[["diagram"]], p = 1, dimension = 1) print(wassersteinDist)