


 
|
|
Selected publications - Division of Algorithmic Data Analysis
This is a briefly commented list of selected publications
of the former Research Division of Algorithmic Data Analysis
at Rolf Nevanlinna Institute.
1999
Gene mapping is a challenging problem for
data analysis. We introduce a new method, based on data mining, in the following
paper.
-
Data mining applied to linkage disequilibrium mapping
by Hannu Toivonen, Päivi Onkamo, Kari Vasko, Vesa Ollikainen, Petter
Sevon, and Juha Kere. American
Journal of Human Genetics, to appear.
Paleoecological reconstruction aims at
estimating environmental conditions in the past. In order to reconstruct climate
history based on fossil data, we consider Bayesian modeling of organism responses
to environmental variables.
-
Applying Bayesian statistics to organism-based
environmental reconstruction by Hannu Toivonen, Heikki Mannila,
Atte Korhola, and Heikki Olander. Ecological
Applications, to appear.
-
Bayesian modeling in paleoenvironmental reconstruction
by Hannu Toivonen, Kari Vasko, Heikki Mannila, Atte Korhola and Heikki
Olander.
Workshop on Intelligent Techniques for Spatio-Temporal Data Analysis in
Environmental Applications, July 99, Chania, Greece.
In the following article we look at a powerful data
mining setting: we show how frequent datalog queries can be discovered,
and we to relate different discovery tasks - such as the discovery of association
rules and their variations - to each other.
Discovery of non-obvious relationships between time
series is an important problem in many domains, such as financial, sensory,
and scientific data analysis. We propose using a wavelet transformation of
a time series to produce a natural set of features for the sequence. In the
proposed method, these features are processed so that they are insensitive
to changes in the vertical position, scaling, and overall trend of the time
series.
An system-oriented view of TASA, a data mining system
for the discovery of episodes and association rules in telecommunication alarm
data, is presented in the next paper.
-
Interactive exploration of interesting patterns
in the Telecommunication network alarm sequence analyzer TASA by Mika
Klemettinen, Heikki Mannila, and Hannu Toivonen. Information and Software Technology 41(9): 557 - 567, June 1999.
The following paper (to appear) discusses the telecommunication
network management aspects of the same TASA system.
A new, efficient method for discovering functional
dependencies and approximate dependencies is described in the following
paper. The scale-up properties of the algorithm are superior to previous algorithms.
1998
An early conference paper on Tane, a new method for discovering
functional dependencies and approximate dependencies (see 1999 for a full
journal article):
Bassist, a Bayesian tool for the approximation
of posterior distributions of hierarchical Bayesian models by MCMC (Markov
chain Monte Carlo) techniques, is described in the following paper.
Research on a statistical analysis of ear infections
in small children has been reported in the next paper. The paper also demonstrates
the application of Bassist in the analysis of event sequence data.
In this paper we consider the discovery of frequent
substructures in chemical compounds. Instead of looking for frequent itemsets,
we look for frequent first order patterns (Datalog queries).
The following report contains some of our early ideas
and results about a general setting, where one looks for frequent patterns
in first-order logic.
Exploratory pattern discovery, machine learning, and statistical
modeling all have a role in data mining. We describe a case study from
paleoecological reconstruction and show how these techniques can be used
in the reconstruction process.
-
Learning, mining, or modeling? A case study in paleoecology by Heikki
Mannila, Hannu Toivonen, Atte Korhola, and Heikki Olander. In Discovery
Science, First International Conference, 12 - 24, Fukuoka, Japan,
December 1998. Springer Verlag.
1997
The connection of a generic levelwise algorithm (see the
last item in this list) to the transversal problem is considered in the following
paper.
An interactive knowledge discovery methodology
is considered in the following paper. The approach is based on the idea of
discovering a large collection of regularities at once, and then supporting
efficient retrieval from that collection.
What do you do with large sequences of events,
such as those generated by telecommunication networks? The following paper
studies how to find sets of interconnected events from such sequences.
-
Discovery of frequent episodes in event sequences.
by Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Data
Mining and Knowledge Discovery 1(3): 259 - 289, November 1997. (Preliminary
Report
C-1997-15, University of Helsinki, Department of Computer Science,
February 1997.)
A generic knowledge discovery algorithm is analyzed
in the following paper. The levelwise algorithm can be instantiated for different
tasks, such as the discovery of association rules or functional dependencies.
-
Levelwise search and borders of theories in knowledge
discovery by Heikki Mannila and Hannu Toivonen.
Data Mining and
Knowledge Discovery 1(3): 241 - 258, November 1997. (Preliminary Report
C-1997-8, University of Helsinki, Department of Computer Science,
January 1997.)
|