Abstract
Classification of brain tumors is a critical issue in neuro-oncology, and the precise and interpretable classification is a challenge. Our paper suggests a hybrid-attention-based multimodal deep learning model, which combines multi-sequence MRI images and radiogenomic features to accomplish explainable and high-quality tumor subtyping. The proposed Explainable Hybrid Attention Multimodal Network (E-HAMNet) employs (i) a spatial stream of attention that, on the fly, highlights salient tumor regions in T1, T1c, T2, and FLAIR images, and (ii) a feature-level attention that weights genomic and radiomic features to capture molecular heterogeneity. A cross-modal attention fusion layer is used to combine these streams and to allow dynamic interaction between imaging and genomic modalities. To achieve robustness, we use a self-supervised pretraining approach to feature extraction and perform supervised fine-tuning on annotated data. To achieve interpretability, we combine Grad-CAM heatmaps, SHAP value attribution, and attention score visualization to give clinicians clear decision support. Experiments on BraTS-2023/2024 and RSNA-MICCAI datasets demonstrate that E-HAMNet is better than recent multimodal CNN, transformer-based, and radiomics pipelines with 99.6% accuracy, 96.4% macro-F1, and 98.2% AUC. It has also been shown that the method has better calibration (ECE 1.9%), as well as strength in missing modalities and domain shift.

