Developing tools for semi-automatic classification of phytoliths: a plea for help with image processing

Carla Lancelotti; Alessandro Mosca; Michelangelo Diligenti; Bernardo Rondelli; Marco Madella

University of Southampton OCS (beta), CAA 2012

Carla Lancelotti, Alessandro Mosca, Michelangelo Diligenti, Bernardo Rondelli, Marco Madella

Last modified: 2011-12-22

Abstract

Phytoliths are an important marker for past human activities related to the exploitation and use of plant re source for consumption and other reasons. The study of phytolith is a relatively new approach within archaeobotany in comparison with macro-remain or charcoal analysis. However, they constitute our only source of information on plant-related activities in all those instances where macro-fossils (seeds, fruits, charcoal) are not preserved for physical, chemical or taphomonical reasons. The microscopic identification of phytoliths is a lengthy and time-consuming process, highly based on the specialist level of experience and knowledge. Indeed, the taxonomical classification of phytoliths is complicated by the biological nature of these inorganic particles that form inside plant cells. As so, phytoliths pertaining to the same category can in fact assume several slightly different forms, with particles that can converge into different categories.

A proposal was made to develop a general framework to incorporate first-order logic (FOL) clauses, that are thought of as abstract and partial representations of the environment, into kernel machines that learn within a semi-supervised scheme. The framework relies on a multi-task learning scheme where each task is associated with a kind of unary predicate defined on the feature space, while higher level abstract representations consist of FOL clauses made of these predicates. The challenge is to demonstrate that, in presence of relatively small collection of supervised examples, the performances of the kernel machines are significantly improved by feeding them with a domain-specific knowledge base of logical clauses.

Preliminary experimental analyses studied the effect of the introduction of the constraints in the learning process for different dimensionalities of the input space, showing that the accuracy gain is very significant for larger input spaces, corresponding to harder learning settings, where generalization using standard kernel machines is often difficult.

The main interest in the proposed approach is then twofold: From a purely computational perspective, it opens the doors to a new class of 'semantic-based regularization machines' in which it is possible to integrate prior knowledge using high level abstract representations, including logic formalisms. On the other hand, the 'phytoliths identification' problem in the Archaebotany context has characteristics that promisingly match the strengths of the present approach: (i) scarcity in the number of examples; (ii) high intra-category feature variability; (iii) intensive use of domain expert knowledge. Finally, phytoliths constitute, in this perspective, a case-study tool to develop a multidisciplinary approach that can be used for the automatic classification of different (organic and inorganic) types of objects as well.

The classification system thus developed will be based on the analysis of images (in the present case, photographs of phytoliths taken with a microscope). We are looking for possible collaborations with image processing experts who can help us building a collection of supervised examples of phytolith morphotypes needed to test the system. The challenge is to develop a system that can, from a photograph that show several objects similar in colour and shape, to isolate phytoliths from the background noise. Any volunteers?