13th International Symposium

DATAMOD 2025

FROM DATA TO MODELS AND BACK

A satellite event of the

23rd International Conference of Software Engineering and Formal Methods - SEFM 2025

DATE 10-11 November

LOCATION Toledo, Spain

About DataMod

DataMod aims at bringing together practitioners and researchers from academia, industry and research institutions interested in the combined application of computational modelling methods with data-driven techniques from the areas of knowledge management, data mining and machine learning.

Modelling methodologies of interest include automata, agents, Petri nets, process algebras and rewriting systems. Application domains include social systems, ecology, biology, medicine, smart cities, governance, security, education, software engineering, and any other field that deals with complex systems and large amounts of data.

Papers can present research results in any of the themes of interest for the symposium as well as application experiences, tools and promising preliminary ideas. Papers dealing with synergistic approaches that integrate modelling and knowledge management/discovery or that exploit knowledge management/discovery to develop/syntesise system models are especially welcome.

Topics of interest

Modelling and analysis methodologies include

  • Agent-based Methodologies
  • Automata-based Methodologies
  • Big Data Analytics
  • Cellular Automata
  • Classification
  • Clustering, Segmentation and Profiling
  • Conformance Analysis
  • Constraint Programming
  • Data Mining
  • Differential Equations
  • Game Theory
  • Machine Learning
  • Membrane Systems
  • Network Theory and Analysis
  • Ontologies
  • Optimisation Modelling
  • Petri Nets
  • Process Calculi
  • Process Mining
  • Rewriting Systems
  • Spatio-temporal Data Analysis/Mining
  • Statistical Model Checking
  • Text Mining
  • Topological Data Analysis

Application domains include:

  • Biology
  • Brain Data and Simulation
  • Business Process Management
  • Climate Change
  • Cybersecurity
  • Ecology
  • Education
  • Environmental Risk Assessment and Management
  • Enterprise Architectures
  • Epidemiology
  • Genetics and Genomics
  • Governance
  • HCI and Human Behaviour
  • Open Source Software Development and Communities
  • Pharmacology
  • Resilience Engineering
  • Safety and Security Risk Assessment
  • Social Good
  • Social Software Engineering
  • Social Systems
  • Sustainable Development
  • Threat modelling and analysis
  • Urban Ecology
  • Smart Cities and Smart Lands

Synergistic approaches include:

  1. Use of modelling methods and notations in a knowledge management/discovery context
  1. Development and use of common modelling and knowledge management/discovery frameworks to explore and understand complex systems from the application domains of interest

Important Dates

10-11 November 2025

RESEARCH PAPERS – SHORT AND FULL

  • Abstract submission deadline (optional): August 22, 2025
  • Paper submission deadline (Extended): August 29, 2025 September 10, 2025
  • Acceptance notification: September 30, 2025 October 3, 2025
  • Revised version: October 7, 2025
    (note that you will have the opportunity to submit the camera-ready
    paper for the LNCS Proceedings with discussions and remarks
    after the symposium, ie, by the end of the year or early Jan/26)

PRESENTATION REPORT

Paper Submission And Publication

TYPES

Papers can take one of the following three types:

  • Regular (research, tool or position) paper, up to 16 pages (excluding references)
  • Short (research, tool or position) paper, up to 8 pages (excluding references)
  • Presentation report, up to 4 pages
  • Presentation reports concern recent or ongoing work on relevant topics and ideas, for timely discussion and feedback at the workshop. There is no restriction as for previous/future publication of the contents of a presentation. Typically, a presentation is based on a paper which recently appeared (or which is going to appear) in the proceedings of another recognised conference, or which has not yet been submitted. Presentation reports will receive a lightweight review to establish their relevance for DataMod (see the Call for Presentation Reports).

    SUBMISSION

    All submissions must be original, unpublished, and not submitted concurrently for publication elsewhere.

    Authors are invited to submit their contributions (regular and short paper) via Easychair

    Authors are invited to submit their presentation report via e-mail at datamod2025@easychair.org

    Papers must be formatted according to the guidelines for Springer LNCS papers, without modifications of margins and other space-saving measures. Authors should therefore consult Springer's authors' instructions and use their proceedings templates, either for LaTeX or for Word, for the preparation of their papers. Springer’s proceedings LaTeX templates are also available in Overleaf. Springer encourages authors to include their ORCIDs in their papers.

    Each paper will be reviewed by three Program Committee members. Notification and reviews will be communicated via email through the Easychair platform.

    PUBLICATION

    Accepted papers will be included in the Symposium programme and will appear in the symposium pre-proceedings. Pre-proceedings will be available online before the Symposium. Condition for inclusion in the pre-proceedings is that at least one of the co-authors has registered for the Symposium. Revised versions of accepted papers will be published after the Symposium in a LNCS volume published by Springer. Condition for inclusion in the post-proceedings is that at least one of the co-authors has presented the paper at the Symposium.

    Didn't find what your are looking for ?

    datamod2025@easychair.org

    Symposium Schedule

    - Central European Standard Time (GMT+1) -

    DataMod will take place University of Castilla-La Mancha (UCLM) in Toledo, Spain.
    For further detail please visit the SEFM webpage

    13:00 - 15:00
    Lunch break (La Fábrica de Harinas)
    15:00 - 15:15


    15:15 - 16:30

    Location: Sala de música (San Pedro Mártir Building)

    Opening

    Invited Talk 1 - Jacopo Soldani

    Title: Explainable Root Cause Analysis for Failing Microservices
    Understanding and preventing cascading failures in microservice architectures is key to ensuring their resilience. This talk presents a novel log-based root cause analysis technique that explains observed failures by identifying possible chains of cascading faults across interacting microservices. The approach operates declaratively, allowing the analysis to either explore all possible cascading failures or focus on those originating from a known root cause. To support practical adoption, we introduce a logging methodology for capturing failure and interaction data, and a prototype tool, yRCA, that automates the analysis process. Through a case study and controlled experiments, we show how our approach can serve as a valuable aid for understanding and possibly preventing failure propagation in microservice-based applications. (Work supported by the research project FREEDA, CUP: I53D23003550006, funded by the frameworks PRIN and Next Generation EU)
    16:30 - 17:00
    Coffee break
    17:00 - 17:30
    Ivan Spajić and Volker Stolz
    Ruleless Digital Twins
    We introduce a transparent digital twin (DT) decision-making approach without the use of explicit decision rules or rule-based models. Our approach utilizes ontological inference and simulation models to explore possible decisions before applying them to the twinning target. As proof of concept, we provide an implementation of the proposed framework purely based on widely-known technologies and standards, and subsequently demonstrate the feasibility of our approach. We discuss benefits and drawbacks, and recognize that a ruleless approach to DT decision making ultimately rests on an effective method of choosing from a multitude of possibilities. Lastly, we consider potential future work and exploration in the context of more effective automation and simulation model usage.   (preprint, PDF)
    17:30 - 18:00
    Keila Oliveira, Diego Souza, Thais Webber, Elizabeth F. Wanner and Carolina Marcelino
    Low-Cost Embedded PSO for Decentralized, Scalable and Sustainable Hydroelectric Dispatch
    The increasing demand for electrical energy highlights the need for efficient operation of power generation systems. This study presents an embedded optimization device for hydroelectric dispatch developed on low-cost hardware. The proposed device introduces control strategies that conserve water resources while reliably meeting energy demand through a portable and efficient Particle Swarm Optimization (PSO) algorithm, adapted for micro- and resource-constrained hydroelectric applications. To validate its effectiveness, a Hydroelectric Power Plant (HPP) dispatch simulation model and the PSO algorithm were implemented and tested on embedded platforms (Arduino Mega 2560 and Raspberry Pi 3 Model B), with performance benchmarked against a high-performance computing machine. Results show that, despite computational limitations, embedded systems can effectively support hydroelectric dispatch planning, providing a cost-effective, scalable, decentralized, and sustainable alternative to traditional control solutions.
    18:00 - 18:30
    Jéssica Richards Nascimento, Carolina Marcelino, Carlos Eduardo Pedreira and Elizabeth Wanner
    On the Relationship Between Neural Gradients and Model Reasoning
    As machine learning increasingly shapes decisions that directly impact people’s lives, the demand for transparency and accountability has grown. Explainable Artificial Intelligence (XAI) has emerged as a field dedicated to making the reasoning of autonomous decision systems clearer and more interpretable. Among the many approaches to XAI, gradient-based explanations use a model’s internal gradients to highlight feature importance. In this work, we investigate whether a model’s gradients reveal consistent patterns that reflect its reasoning process. To keep the problem tractable, given the exponential growth in neural network dimensionality and corresponding gradients, we conducted our experiments using a Multilayer Perceptron as a controlled, toy example. For statistical analysis, we applied the Mantel and Energy tests. Our results indicate that gradient patterns emerge as intrinsic properties of the model’s architecture and the dataset on which it is trained.
    09:30 - 11:00

    Location: Sala de música (San Pedro Mártir Building)

    Invited Talk 2 - Jose Ignacio Requeno Jarabo

    Co-author: Maria Elena Gomez Martinez, Universidad Complutense Madrid, Spain
    Title: Analysing Product Lines of Concurrent Systems
    The analysis of software product lines composed of concurrent systems remains a significant challenge due to the combinatorial explosion induced by variability and behavioural complexity. This plenary session presents recent advances in the modelling and analysis of product lines using Petri Net Product Lines (PNPL), an extended formalism in which basic Petri net constructs are annotated with feature-based variability information. The approach enables both structural and behavioural properties—such as reachability, liveness, invariants, and deadlock-freeness—to be analysed collectively for entire product families, employing lifted analysis techniques and algebraic reasoning. Further, the integration with coloured Petri nets and runtime verification frameworks allows a scalable evaluation of time-dependent behaviour across many product configurations. Illustrative industrial case studies, such as workflow performance analysis in medical domains, are used to demonstrate the potential of PNPL for efficient, verifiable, and tool-supported engineering of complex and variable concurrent systems.
    11:00 - 11:30
    Coffee break
    11:30 - 12:00
    Manuel I. Capel and Luis Rodríguez Domingo
    Probabilistic Algorithm with Dynamic Load Balancing for GPU-Accelerated Tumor Growth Simulations
    Efficient simulation of tumor growth using cellular automata (CA) requires high computational power, especially when scaling to large biological systems. In this paper, we present a probabilistic CA model with a GPU-accelerated dynamic load balancing strategy, using CUDA to optimize execution and scalability. Our approach integrates probabilistic modeling of tumor dynamics with adaptive GPU scheduling, bridging the gap between high-level modeling and efficient parallel execution. We compare our method with traditional CPU implementations and static load balancing, and demonstrate significant performance gains. Results show that the proposed strategy reduces execution time by up to 54% for a 1024 x 1024 grid of CUDA thread blocks while maintaining accuracy. These contributions highlight how computational modeling choices and heterogeneous architectures can be co-designed to enable scalable biomedical simulations, making our work relevant not only for computational oncology but also for the broader field of data-driven modeling and analysis of complex systems.
    12:00 - 12:30
    Zhi Zhang, Dalal Alrajeh, Yue Gu, Roberto Metere, Michele Sevegnani, Kangfeng Ye and Poonam Yadav
    Reachability Specification Adaptation for Continuous Dynamical Systems
    Continuous dynamical systems are fundamental in modelling a wide range of real-world phenomena, from cyber-physical systems to industrial automation. Such systems operate in dynamic environments where unanticipated situations can arise. Requirements specified at design time may become partially unachievable when deployed. We propose a novel method for reachability specification degradation tailored to continuous dynamical systems. Our approach involves sampling points within the initial state space region, estimating reachability regions over time using the Lipschitz constant, and augmenting the original specification through disjunction operations. We implement our method to support how a system can adapt to requirements in real-time.
    12:30 - 13:00
    Ramesh Krishnamurthy, Guillermo Perez, Kasper Engelen, Stijn Bellis and Thomas Gueutal
    Taylor's DAgger: Verified Integration for Efficient Imitation-Learning of an MPC
    We focus on the challenge of formally verifying Model Predictive Control (MPC) by learning a surrogate in the form of a neural network (NN). While this approach simplifies questions of latency and allows the application of NN-verification techniques, it crucially relies on effectively training the NN to imitate the MPC, a process known as imitation learning. Recognizing the difficulties in learning with nonholonomic constraints and unstable dynamics, we build upon the Dataset Aggregation (DAgger) approach, which iteratively collects data by querying the expert (MPC) on states reached by the learned model. Our main contribution is our proposal to use verified integration methods, specifically Taylor models, to obtain a conservative overapproximation of reachable states in a dynamical system. This allows for the generation of more data points per simulation along an NN-controlled trajectory and facilitates the consideration of small perturbations to control actions after retraining. Our hypothesis is that these extensions will lead to a more efficient learning of the MPC surrogate, which is empirically evaluated on an autonomous truck-with-trailer navigation problem.
    13:00 - 15:00
    Lunch break (La Fábrica de Harinas)
    15:00 - 15:45
    Daniel Leto, Roberto C. Alamino, Elizabeth F. Wanner, Thais Webber and Diego Souza
    Physics-Informed Neural Networks for SIR Models: Evaluating Data Integration Strategies (ONLINE)
    Physics-informed neural networks (PINNs) represent a promising idea in deep learning, capable of integrating the robust predictive power of neural networks with the foundational principles of physical laws, often expressed as differential equations. This technique is particularly valuable for scientific discovery in situations where observational data are limited or noisy, offering an alternative to obtain robust and data-efficient solutions for complex forward and inverse problems. In mathematical epidemiology, they can be used as a framework to model disease transmission dynamics, such as those described by the classical Susceptible-Infectious-Recovered (SIR) model, defined by ordinary differential equations. Despite their inherent advantages, training PINNs to accurately capture complex dynamic behaviours poses significant challenges. These difficulties often stem from training pathologies inherent to this technology. This study evaluates two distinct methodologies for employing PINNs to solve the SIR model's equations. The first approach relies purely on minimising the residual of the underlying differential equations. The second, building on recent advances, incorporates observational reference data specifically for the Infected (I) compartment into the loss function, alongside the residuals, simulating scenarios where disease dynamics could be extracted from real-world data. Our empirical findings indicate that, while the residual-only approach exhibits variable accuracy, the strategic integration of observational reference data dramatically enhances model accuracy and stability.
    15:45 - 16:30
    Smayan Agarwal, Aslah Ahmad Faizi, Shobhit Singh and Aalok Thakkar
    From Transformers to Weighted Automata: Towards Verification of Large Language Models (ONLINE)
    Large language models (LLMs) are increasingly deployed in safety-critical settings, yet their black-box nature makes it difficult to provide formal guarantees about their behaviour. Existing verification approaches rely primarily on empirical probing and testing, leaving open the question of how to reason rigorously about general-purpose transformer architectures.
    In this work, we establish a principled bridge between transformers and weighted automata, a classical model from formal language theory. This connection enables us to transfer verification tools from automata theory to the analysis of LLMs. Our contributions are twofold: First, we develop a formal correspondence between transformer architectures and weighted automata over reals, showing how distributional properties of LLMs can be captured within this framework. Second, we introduce an identity testing algorithm for weighted automata that provides a statistical method for distinguishing whether two stochastic models define the same distribution up to a tolerance threshold.
    This work provides the first formal bridge between modern neural sequence models and classical automata theory, clarifying both the potential and the computational challenges for rigorous LLM verification.
    20:30 - 22:30
    Social Dinner (Hacienda del Cardenal)

    Organization

    PROGRAM CO-CHAIRS

    • Livia Lestingi, Politecnico di Milano, Italy
    • Gwen Salaün, Université Grenoble Alpes, France

    PUBLICITY CHAIRS

    • Ouadie Khebbeb, Université Grenoble Alpes, France

    STEERING COMMITTEE

    • Oana Andrei, University of Glasgow, UK
    • Antonio Cerone, Nazarbayev University, Kazakhstan
    • Riccardo Guidotti, University of Pisa, Italy
    • Marijn Janssen, Delft University of Technology, the Netherlands
    • Stan Matwin, University of Ottawa, Canada
    • Paolo Milazzo, University of Pisa, Italy
    • Anna Monreale, University of Pisa, Italy

    PROGRAM COMMITTEE

    • Oana Andrei, University of Glasgow
    • Kyungmin Bae, Pohang University of Science and Technology (POSTECH)
    • Juliana Bowles, School of Computer Science, University of St Andrews
    • Giovanna Broccia, ISTI-CNR, FMT Lab
    • Antonio Cerone, Nazarbayev University
    • Robert Clarisó, Universitat Oberta de Catalunya
    • Carla Ferreira, Universidade NOVA de Lisboa
    • Marc Frappier, Université de Sherbrooke
    • Elisa Gonzalez Boix, Vrije Universiteit Brussel
    • Riccardo Guidotti, University of Pisa
    • Alexander Kocian, University of Pisa
    • Ricardo M. Czekster, Aston University
    • José Machado, University of Minho, DI, ALGORITMI/LASI
    • Paolo Milazzo, Dipartimento di Informatica - Università di Pisa
    • Pedro Ribeiro, University of York
    • Arpit Sharma, Indian Institute of Science Education and Research
    • Volker Stolz, Høgskulen på Vestlandet
    • Martin Tappler, TU Wien
    • Thais Webber, Aston University
    • Lina Ye, CentraleSupélec, LMF, University Paris-Saclay, France