The next key step is to export the new philosophy to other volcanic scenarios. Therefore FEMALE will extend this study to seismic time series associated with other volcanoes to create a new inventory of case studies that will enable volcanic forecasting and permit comparison with complete seismic catalogues. After identifying the parameters representative of each volcanic episode, it will be necessary to compare these new examples by performing a time series comparison with any of the stored case studies. We will compare each new temporal series (query sentence) with the set of stored series in the Preliminary Inventory of Case Studies.
FEMALE will extend this study to other seismic series associated with different volcanoes to increase and improve the Inventory of Case Studies to enable efficient volcanic forecasting. We have selected four reference volcanoes from similar eruptive regimes form the primary selected scenarios: effusive (Piton de la Fournaise), mixed effusive-explosive (Redoubt) and dome-explosive (St. Helens and Merapi). It is crucial to include seismic series without eruptions; complete statistical analysis must contain eruptive and non-eruptive examples. We will select a large number of new P-Cases (P>>N) of new volcanic scenarios using the very large volume of seismic data available in public repositories such as IRIS, ORFEUS, GNS, RESIF, and GFZ. For non-eruptive episodes we will use as an indicator the change to orange or red in the volcanic light system where an eruptive episode did not occur.
The working team will be leader by Prof. Ibáñez with a direct collaboration with profs. Prudencio, Ontiveros, Feriche and Benítez of the Research Team and several members of the Working Team.
Once each scenario is selected, the continuous seismic data will be transferred from the repository to the Cloud. This selection will be done for all available seismic stations placed around the volcano. The selection of the seismic data will be done over the longest temporal interval as possible prior to the declaration of volcanic unrest. We will then perform the above-described procedures of WP2 and WP3 for each new scenario in order to create new case studies: including the baseline and concept drift analyses and providing their associated vectors and extracting the changing parameters matrices.
The working team will be leader by Prof. Ibáñez with a direct collaboration with Drs. Benitez, Prudencio, Alguacil and Mota of the Research Team and several members of the Working Team.
In this WP we will ask if each new case studies matches any of the previously analysed case studies, providing a level of similarity according to a probability value, and if this probability is low, we will assign this case as a new case studies (Figure 7), enlarging the Inventory of Case Studies.
In the literature there are several approaches26 that tackle the comparison problem between spatial-temporal sequences, ranging from speech recognition to medicine. FEMALE will start by using different approaches and metrics to compare different temporal series. Since a priori, we do not know the duration of the new incoming crisis, we will use several procedures27 to index the inventory dataset implementing “whole matching” and “subsequence matching” approaches. Whole matching corresponds to the instance when the query sequence and the sequences in the database have the same lengths (y(t) = x(t)). In Subsequence matching the query sequence is potentially shorter than the sequences in the database. This comparison is very useful for real-time seismic monitoring. Independent of the numerous techniques available in the literature we will initially use the three most successful and accepted approaches to compare time-series:
This approach is based on the advanced use of classical Euclidean distance metrics described in Table 3.
|Table 3a||Direct Metrics. Traditional Approaches.|
|Euclidean Distance, Manhattan||Euclidean Distance quantifies the difference between times series. Manhattan distance is the absolute difference between series.||Easy to compute pairwise distances which provide similarity measures or minimum distances.||Requires sync times series with same length.|
|Mahalanobis Distance||Computes the difference between time series but accounts for non-stationarity of variance and temporal cross-correlation||Using covariance matrix, the measure casts a probabilistic interpretation of the similarity.||Requires the estimation and inversion of a covariance matrix.|
|Pearson Correlation||Encodes the degree of linear relations between time series in a range of intermediate values.||Not affected by amplitude scaling or translation.||Queried of many time-series can be computationally expensive.|
|Table 3b||Direct Metrics. Elastic Measures, subsequence matching.|
|Dynamic Time Warping||Dynamic Programming algorithm designed to find the best coincidence (warping path) to compute a distance-based metric across two time-series.||Applied of varying speeds signals, not synchronized, and independently of their temporal pattern behaviour.||Numerical and heuristic approximations are needed to refine accuracies.|
|Edit Distance on Real sequence (EDR)||Exploit the observation that the warping in one sequence can be seen as a gap addition in the other sequence. EDR assigns penalties to the gaps between the two time series.||Penalty gaps and constant reference points ensure robustness for missing data gaps.||Need to define a constant reference point and the threshold.|
|Longest Common Subsequence (LCS)||Finding the longest common temporal sequence for two time-series (LCSS distance).||Widely used in speech recognition and text pattern mining.||Needs to scale or transform the sequence to the target one.|
In this approach the information is projected to a new representation space where the time series are compared using specific metrics, solving the intrinsic problems associated with different time durations of the Case Studies (Table 4).
|Table 4||Transformation Approaches: Signal Processing, Symbolic and Dimensionality Reduction|
|Wavelets, DFT||Computes the similarity score based on the wavelets/DFT coefficients.||Use the properties of the FFT and Wavelet transforms.||A poor data structure to indexing the time-series produces computational bottlenecks|
|Adaptive Piecewise Constant Approximation (APCA)||Time series are approximated via piece-wise transformations designed to find the lowest reconstruction error.||Fast computation, high pruning power with high accuracy, low query cost, suitable for large datasets.||Needs specialized indexing and complex data structures.|
|PCA||Computes the similarity between principal components that explain the majority of the variance across time.||Unsupervised, easy to compute method with robustness against noise.||Covariance data matrix estimation could be complex if there are missing data gaps|
|SAX (Symbolic Aggregate Approx.)||Dimensionality reduction technique based on a symbolic representation of a time-series.||Text data mining techniques can be used on the symbolic representation.||Needs to define the symbols, loss of temporal and amplitude resolution.|
We will use in this sub-task advanced approximations to measure distances based on the use of Deep Neural Networks to parameterize the case studies sequences in a simpler embedded space (Table 5).
|Table 5||Transformation Approaches: Deep Learning|
|Variational autoencoders (VAEs)||A single Neural Network is used to embed the time-series in a probabilistic latent space.||Fast, scalable and flexible, with a rich variety of architectures configurations||Rely heavily on exploiting parametric models in the form of deep neural networks.|
|Siamese Network||Two Deep Neural Networks are used to embed both time series into a common latent space in which distances are computed using Euclidean-distances.||Distance metric is learned by extracting relevant features in a pure data driven approach.||A pre-defined, Lp metric has to be computed over the latent representation.|
|NeuralWarp||Single Neural Network architecture to learn a warping function directly from the latent representation.||It works directly in the modelling of the elastic measurement.||Needs data-labelling of pair temporal similar pairs (P) and dissimilar (N).|
This is a probabilistic framework that estimates the time evolution of the seismic features, and provides adaptive uncertainty for a given Case Studies. FEMALE will associate a probability of similarity between the new P-cases with the stored N-cases. The system is dynamic, elastic, and learns: if the new P-case does not resemble any of those analysed previously, then it will be incorporated into the Inventory of Case Studies as a New M-Case, enlarging the Inventory (where N < M < P). This Inventory will be used in the future to compare any newly occurring volcanic scenario, providing a real time probability of similarity with some case of our Inventory, i.e. we will forecast volcanic activity by understanding the past.
The working team will be leader by Prof. Benítez with a direct collaboration with Drs. Mota, Alguacil, Prudencio, Feriche and Ibáñez of the Research Team and several members of the Working Team.
In order to perform an additional test we will design an experimental seismic survey in two active volcanoes in which we had previous experience in the seismic signal analysis. The purpose is to monitor and recording in continuous the seismic signals, using at least four Broad Band seismic stations in each volcano. Since we have the technology that permits to send in real time the seismic signal from the field to our laboratories (in Europe, USA and Japan), we can perform in real time both analysis, those propose in WP2 and WP3 and that will be described in WP5, with the advantage to check in real time the potential changes on the parameters defined in WP1.
The first selected volcano is “Volcán de Fuego de Colima”, in Mexico. There are several reasons that justify this election: a) It is an active volcano, with a dynamic mix explosive-effusive, and could integrate the experience derived from the first three scenarios; b) We have already a large experience in the study of classification and labelling seismo-volcanic events of this volcano, reflected in our scientific productivity including doctoral and MSc Thesis; c) the comparison between techniques could be direct and easily performed; d) we have continuous collaborative agreement with the University of Colima (included in the working team) that will facilitate the deployment and maintenance of the instruments. The seismic instruments are available throughout the pool of seismic stations of the INGV in Rome and Pisa. The double affiliation of Prof. Ibáñez as associated researcher of the INGV and the participation of Drs. Zuccarello, Del Pezzo, Bianco, Saccorotti, Chiodini, Giampiccolo, Musumeci and Tuvé from INGV as members of the Working Team is a guarantee of the use of these stations.
The second selected volcano will be decided when the project start, on the base of potential volcanoes of interest according their level of activity and hazard. Since we have researchers in the Working Team from USA, Mexico, Argentina, Italy and Japan, with direct access to different active volcanoes, and they could provide support at all levels for the experiment, the potential places deployment are multiple. In any case we have already two potential scenarios that will be ready to be used. The first one is Copahue volcano (Argentina) that at the day in which this proposal is presented has a level Orange in its eruptive stage with potential damages to the near villages. The second one is the Popocatepetl volcano at Mexico, with an intense volcanic activity in the last decade. In both cases local institutions are now involved in the project.
The duration of the deployment will be as larger as possible, with a minimum interval of at least two years.
The deployment and working procedure for both cases will be the same:
The working team will be leader by Prof. Ibáñez with a direct collaboration of Dr. Bretón for the case of Colima and the direct collaboration of all members of the Research Team and of the Working Team. For the case of the second volcanic scenario, the person who will coordinate the work together Prof. Ibáñez will be designated when the scenario is chosen.