Complexity penalized support estimation

Introduction

Installation

Documentation

Tutorial

Introduction

We will present a method for the estimation of a support of a probability density function for two dimensional cases. The method works also for the estimation of the support of an intensity function of a Poisson process. The estimator is spatially flexible, allowing us to estimate supports which consist of disconnected components.

The method is described in the article "Complexity penalized support estimation" .

We will provide a R package called "densup", containing programs for estimating the support and plotting the estimate. R is a language and environment for statistical computing and graphics. It can be downloaded from R archive network .

The estimation of density support may be applied to the detection of abnormal behaviour of the system, plant, or machine. We may apply our estimator to define a nonparametric multivariate method for statistical quality control, which is an extension of the Shewart methodology based on tolerance regions. Support estimation may be also applied to measure performance of an enterprise in terms of technical efficiency. The latter is distance from the observed productivity to the boundary. We may apply our estimator to the estimation of the support of a Poisson intensity. This may be applied for example to estimate the boundary of a forest, when the location of individual trees is distributed according to a planar Poisson process with unknown intensity function.

The package "densup" is designed by Jussi Klemelä . I am grateful from bug reports.

Installation

The programs are provided as R-package.
  • Download package densup_0.1.0.tar.gz.

  • Installation instructions are provided by issuing command R CMD INSTALL --help or command R INSTALL --help. Installation may be done (in unix) with the command:

    R CMD INSTALL densup_0.1.0.tar.gz

    This will install (in unix) the package to directory "/R_HOME/R/library/", where "R_HOME" is the location of R tree. That is, directory "/R_HOME/library/denpro" will be created and files will be installed to that directory. For example, R_HOME may be /usr/lib/R.

  • In R, use the command

    library(denpro)

    which makes the functions available.

Documentation

Here is a listing of procedures, which the package provides.

  • grow : presmoothing of data by growing a tree whose terminal nodes represent certain bins of the sample space and which are annotated with frequencies
  • plotsupport : estimates the support and plots the estimate
  • simmix : generates random vectors from mixtures of Gaussian densities
In R use the command help(function) to get online manual for "function".

Tutorial

Below we show a session which uses the main features of the package.

#First load the library
 
library(densup)
 
0:th example

dendat<-matrix(rnorm(20),10)
#Grow the tree
N<-c(8,8)
h<-0.1
tree<-grow(dendat,N,h)
#Prune the tree
alpha<-0.00065
lambda<-0.1
ps<-plotsupport(dendat,tree,alpha,h,lambda,data=T)   


 
1st example: Gaussian density
 
#Generate a sample of size 100 from a standard Gaussian distribution

set.seed(1)
dendat<-matrix(rnorm(200),100)

#Grow the tree

N<-c(64,64)
h<-0.1
tree64<-grow(dendat,N,h)
 
#Prune the tree

alpha<-0.00065
lambda<-0.1
ps065<-plotsupport(dendat,tree64,alpha,h,lambda,data=T)
               
#Try other colors

colonum<-2
#colo<-rainbow(colonum,s=1, v=1, start=0, end=max(1,colonum - 1)/n, gamma=1)
#colo<-heat.colors(colonum)
#colo<-terrain.colors(colonum)
#colo<-topo.colors(colonum)
colo<-cm.colors(colonum)
colo[1]<-"white"

image(ps065$x,ps065$y,ps065$z,col=colo,xlab="",ylab="")  
points(dendat,pch=20)                                       
                               
2nd example: mixture of two Gaussians

#Generate a sample of size 125 from a mixture of two standard 
#two-dimensional Gaussians. We use a program "simmix" to generate data. 


source("~/denpro/R/simmix.R")
d<-2
mixnum<-2
M<-matrix(0,mixnum,d)
D<-8
M[1,]<-c(0,0)
M[2,]<-c(D,0)
sig<-matrix(1,mixnum,d)
p0<-1/mixnum
p<-p0*rep(1,mixnum)
n<-125
dendat<-simmix(n,d,M,sig,p,seed=2)
 
N<-c(64,64)
h<-0.1
tree64<-grow(dendat,N,h)
 
alpha<-0.0005
lambda<-0.1
ps070<-plotsupport(dendat,tree64,alpha,h,lambda,data=T)
                                                           


3rd example: mixture of three Gaussians

#Generate a sample of size 150 from a mixture of three standard 
#two-dimensional Gaussians. We use a program "simmix" to generate data. 
#Means are on vertices of a triangle.
#Distance between vertices is D.

source("~/denpro/R/simmix.R")
d<-2
mixnum<-3
M<-matrix(0,mixnum,d)
D<-8
M[1,]<-c(0,0)
M[2,]<-c(D,0)
M[3,]<-c(D/2,sqrt(3)/2*D)  
sig<-matrix(1,mixnum,d)
p0<-1/mixnum
p<-p0*rep(1,mixnum)
n<-150
dendat<-simmix(n,d,M,sig,p,seed=2)
 
N<-c(64,64)
h<-0.1
tree64<-grow(dendat,N,h)
 
alpha<-0.0006
lambda<-0.1
ps7<-plotsupport(dendat,tree64,alpha,h,lambda,data=T)
      
                 

History

Version densup_0.1.0 uploaded at 26.04.2002