A new philosophy for seismic DATABASES
The first objective of FEMALE is to establish a new suite of parameters that characterizes continuous seismic signals (seismograms), avoiding the identification and labelling of isolated events. A key point in seismo-volcanic analysis is how to represent observations, i.e.: the determination of a set of meaningful features that relate to measurements made on observations. Such representations are typically obtained by extracting characteristics (features) from the data and using them in a new frame of reference to perform a specific task (Figure 4). In past works4,8,11, various methods have been used to transition from the original frame of reference to a feature frame, which can include learning features or designed features. However, using one or more methods for defining features offers no guarantee that the optimum frame of reference for the data will be found. Determining an appropriate feature frame is always the critical point to our study. FEMALE proposes to start the study of this parameter characterization of the signal using a set of initial parameters or values we already tested carefully13. These features will be calculated using temporal segments of the seismogram. We will group them into three types according to the information they represent, and combine the best set of features characterizing the volcano seismic waveforms into a “Vector of Features”. This will have a dimension defined by the number of features: a vector of dimension D. The three types of features are described below.
- Features of Phenomenological Nature: These features provide information about the nature of the seismic signal. Features that describe a given seismogram represent characteristics that are independent of the volcanic system. The Phenomenological/Geophysical feature set will be chosen by accounting for the nature of the continuous seismic signals (see table 2 for a summary), and can be grouped in the following sub-categories:
- Impulsiveness of the signal: for example, the normalized cumulative sum function of the absolute value of the signal may characterize the onset behaviour of the event.
- Spectral model: for example, the normalized cumulative sum function of the Power Density Spectrum (PSD) will be used to determine the frequency content of the segment.
- Periodicity and harmonics: we will use different features, such as the second peak of the normalized autocorrelation functions in the time domain to emphasize the periodicity of the signal, and the second peak of normalized autocorrelation in the frequency domain to indicate the presence of harmonics.
- Energy: for example, the maximum energy of the envelope of the segment and the maximum energy of the envelope of the PSD.
- Spectrum: for example, the spectrum low-pass shape indicates the similarity between the PSD and a low-pass shape.
- Features of Statistical Nature: These features represent statistical parameters of the waveform and its frequency content. The seismic record can be described by its statistical properties in the time and frequency domains; some of the most relevant statistical attributes are:
- Seismogram statistics: standard deviation, average, median, maximum, kurtosis, asymmetry and Shannon/Renyi entropies, among others.
- Spectral statistics: average spectrum and energy. The proposed set of statistical parameters contains basic statistical measurements extracted for each segment of analysis in the temporal domain and the Fourier spectrum.
- Features Based on Signal Domain Transforms: These features are determined by applying a transform to the waveform to characterize the signal in a different domain, e.g., in the frequency domain. Here, we propose a classical cepstral parametrization scheme, based on a spectral transform of the waveform, followed by a normalization step with a simple (plain) Dimensionality Reduction (DR) algorithm. We will use the Cepstral Coefficients (LFCC) using the short-time Fourier transform. The LFCC coefficients are obtained by applying a Discrete Cosine Transform (DCT), aimed at de-correlating the output of different filter banks into a vector of features. Additionally, we will use time-based frequency representations, such as the spectrogram (variation of frequency along time) and the scalogram (evolution of the Continuous Wavelet Transform -CWT- coefficients as function of time and frequency).
These approaches are examples to be explored in this project. Additional methods will be considered as needed, such as: spectrum-temporal features applied to the seismo-volcanic signals21 or features defined by other researchers12. Additionally, purely data-driven techniques to explore shared feature spaces can be applied using deep neural networks in combination with the proposed features.
Once we have selected the adequate parameters, they will be used for both, the determination of the New Parametric data bases (WP2) and to use them for training automatic classification systems (WP5).
This WP should be finished at the end of the 6 month of the project, but according to the success of the following WPs, the definition of parameters can be revised periodically, being expected to have a revision procedure at the beginning of the 2nd and 3th years.
The working team will be leader by Prof. Benítez with a direct collaboration with Drs. Mota, Alguacil, Prudencio and Ibáñez of the Research Team and several members of the Working Team.