In this project, operated in cooperation with the BGR and within the JPI Oceans project Mining Impact, the amount, size and finally mass of polymetallic nodules lying on
the seafloor shall be estimated. For decades, research on these mineral resources has been conducted, depending on the world market. Currently a UN organization
(the ISA) governs the exploration and intended exploitation. To monitor the minerals as well as the diverse biology in this sensitive habitat, precise measurements
are required. This includes detections of the nodules as well as the biology.
The first part of this project was targeted to derive the percentage of the seafloor covered in nodules. Subsequently more detailed segmentations yielded single nodule sizes, and, combined with automatically detected lasermarkers, the relative size of the nodules in cm. In upcoming studies, the developed algorithms will be extended to estimate the nodules amounts, covered with sediments to increase the estimation accuracy.
The learning is done with the C++ library mllib, image processing with the opencv library. Visualizations and explorations are realised with Zeus and Pan.
See the publications page for results of this work.
One of the major projects, is the automated detection of megafauna (as well as flora and structures created by those animals).
I started about three years ago and have been working on it with some longer breaks for the manganese nodule project.
It is a cooperation with the Alfred Wegener Institut, the National Oceanography Centre and the JPI Oceans partners that provide the image data. The aim of the project is to create an automated detection system that incorporates the expertise of the biologists while keeping their input as small as possible. Parameters for different steps of the system are derived
from a fully-annotated workshop transect. As the central part, supervised machine-learning is applied.
The learning is done with C++ libraries, the image processing is done with the superb opencv library. Results of this project were so far published as a PloS One journal article, as a conference talk at the OCEANS '12 and the DSBS 2012 and as a poster at the GEOHAB 2013.
Biigle is an online image-browsing software for the manual analysis of image transects,
especially underwater camera cruises (see below).
In an effort to evolve the Biigle tool to current state-of-the-art infrastructure and additionally extend its functionality to improve the scientific validity and robustness of annotation data, the DIAS (DIAS Image Annotation Software) was implemented. The version hosted by GEOMAR is the Alpha version that was created for on-ship use during the research cruises SO239 and SO242-1 for a manganese nodule project.
Copria (or "collaborative pattern recognition & image analysis") is a web-based data flow processing software. The graphical user interface is developed in common web technology (HTML5, JS, jQuery) and adds upon a MySQL database. The purpose of copria is to allow for an efficient usage of complex data processing pipelines, to share such pipelines among scientists for collaborative exploration of complex and big data stacks. A pipeline thereby consists of several atomic processing "nodes" that are currently developed for the whole range of pattern recognition, data mining, image processing and visualization tasks. Pipelines are executed on the CeBiTec compute cluster to provide high efficiency.
Most of the machine-learning and image-processing done in the various projects is time-consuming and not feasible to be done on a single computer. The CeBiTec thus has a compute cluster with around 2,000 cores. To start jobs, run them and monitor their progress, i developed Ares (PHP, job start), Hades (C++/PHP, file server interface and job_control) and Athene (PHP, job progress and results).
While the learning and other computations are time-consuming, the evaluation of results has to be web-based to ease the exchange between researchers
while still allow to be dynamic and explorative, in essence: rapid-prototypal. Therefor i implemented a suite of frequently used PHP scripts
that ease the interplay between results and original data. Some are very fundamental extensions, suitable for any PHP project (Athene), some more focused
on machine-learning and image-processing (Apollon).
To ease the processing and allow rapid access to the data as well as the algorithms, Zeus was developed. It is a webpage with an editor section at the top, where scripts can be created and manipulated. The lower part is the section where the results of the execution of these scripts will be displayed. Zeus has access to the scripts of Demeter and Apollon. If a script is relevant to other users, it can be made accessible without the editing option, making the sharing of results simple and efficient.
Most evaluations can be done with Zeus. Others though require a more sophisticated GUI and depend on a wider code base. Those evaluations
of the polymetallic nodule detection and a re-evaluation tool to browse the detections made by iSIS (Ate).
Mali is a collection of machine-learning and image-processing algorithms written in C++11. It contains a vector-based data-structure and incorporates vector-expressions to increase the performance. Some basic algorithms originate in other projects and were wrapped to confer to the fundamental data-structure. Mali was a combined effort of several Ph.D. students and is a major advancement of the previously used mllib.
My master's thesis dealt with the application of an algorithm originating in the field of ecology (Ripley's-K function) to brain tissue images. Ripley's-K and its subsidiaries,
Ripley's-L and the O-ring statistic give information about the evenness of a two-dimensional point distribution. It was first used to
describe the relative occurence of a single plant species and then extended to incorporate multi-class relationships (like: "do lions aggregate around ponds?").
In my case, i used brain tissue images, created by the Neuropathological institute of the Bethel hospital. The tissue samples were taken from patients with epilepsia. The task was to automatically detect irregularities in the cell-distribution within the tissue. Usually, the neural cells form a layered pattern, but for sick patients, this pattern is severely damaged and thus irregular. These irregularities were detected by the abnormal Ripley's-L statistic.
HydeON was a student project for the dynamic, browser-based exploration of clustering results, created with a Hyperbolic Self-Organizing Map,
thus the name (Hyperbolic Data Explorer Online, there was an offline version before).
The difficulty was to implement a browser-based version, as the datasets are usually large and a lot of graphic operations are required.
Luckily, Adobe Flex allows to manipulate the color tables of a graphic, making it possible to change pixel values in real-time. Hence the
exploration of the clustering became very intuitive.
HydeON was later improved by other students and became a part of what was BioIMAX and is now comvoi under the name WHide.
During a one-semester student exchange to the Image and Signal Processing Group at the University of Warwick i create a branch of Biigle for the online browsing and manual annotation of pathological images. The main differences, that made this development necessary are the huge size of pathological images (>1 GB per image) and the diverse morphology of pathological tissue. The first problem was solved by using a pyramid image coding, that cuts each image into squared tiles at different zoom-levels. Only the tiles in the currently viewed region and zoom-level are transferred to the user. This is similar to the Google Maps interface. The second problem was solved by introducing further annotation shapes like circles and, most importantly, polygons to allow for a detailed annotation, e.g. of the outer boundary of cells. Paigle is written in ActionScript and MXML (Adobe Flex), with a Backend in PHP. The communication is done through REST server requests.
My bachelor's thesis relates to the Rich Internet Application Biigle (see project Biigle).
Until now, Biigle is used to browse and annotate almost 200,000 images. Within these images almost 700,000 annotations have been set.
To allow an easier access to the images with interesting annotations, i created some dynamic visualizations (histogram, scatterplot, parallel coordinates,
netmap, ...). The visualizations allow to select the annotation categories one is interested in and based on these select a group of images for further
The visualizations are written in ActionScript and MXML (Adobe Flex), with a Backend in PHP.
Biigle is an online image-browsing software for the manual analysis of image transects,
especially underwater camera cruises. Various research institutions like the Alfred Wegener Institute,
the Bundesanstalt für Geowissenschaften und Rohstoffe or commercial partners like
Statoil use Biigle for the semantic annotation of images. Depending on the research task,
different classes for annotation (e.g. flora and fauna for biologists; minerals and sediments for geologists)
are available as well as different types of markers (points, lengths, areas and grids).
Biigle is written in ActionScript and MXML (Adobe Flex), with a Backend in PHP. Communication is done through amfphp.