My work focuses on
world-model construction by integrating
multimodal process mining,
mixed-reality simulation, and
video generation models
to capture and model tacit expertise.
Postdoc Search:
I am currently seeking a postdoctoral position in the fields of
multimodal learning and generation.
My goal is to advance AI systems capable of learning, validating, and simulating
complex real-world processes from heterogeneous data streams.
Here is an example of a joint video & digital-system-mimetic world-modeling generation approach I’m working on.
Research
I'm interested in conceptual modeling, world-model construction, computer vision, deep learning, generative AI, multimodal representation learning, and differentiable algorithms for discovering, understanding and simulating real-world processes.
Some papers are highlighted.
A unified research framework integrating multimodal process mining with
mixed-reality elicitation.
The thesis models how real-world work unfolds by combining video, audio,
interaction logs, sensor data, and immersive MR simulations to extract tacit expertise and
build next-generation conceptual models.
Proposes an alignment framework that tunrs multimodal embeddings into
formal conceptual model symbols, enabling explainability, traceability, and hybrid reasoning in
mixed human–AI modeling workflows.
Introduces ViEnNa comics as a process-model notation that combines object-centric event logs and
multimodal evidence into narrative visual diagrams enabling richer, more intuitive process understanding.
Applied vision–language models to relate UML diagram elements with synthetic visual or acoustic evidence; introduced a user study linking UML fragments to verbal descriptions and observed interactions.
Designed NLP pipelines that translate multimodal observations into stakeholder-specific jargon, enabling domain-adaptive representation and improved explainability.
Built physics-informed models predicting transistor degradation; combined simulation data with ML-supported curve-fitting for accuracy under thermal and stress conditions.
Formulated multimodal process mining as a representation-learning task; developed unified embeddings combining video, audio, and UI interactions for robust discovery under ambiguity.
Surveyed ML-ready encodings of BPMN/UML/Petri nets; categorized graph-neural, image-based, and text-based approaches; highlighted open challenges in multimodal interoperability.
Examined health issues associated with XR headset usage and introduced a system for detecting and
monitoring ocular and vestibular complications, including actionable guidance for prevention.
Showed experimentally how spatial and temporal dimension reduction of sensor streams can lead to more successful predictive models in different applications.
Designed and implemented a predictive ML model for a distributed system applying machine learning on data streams to estimate dominant pollution sources in real time.