My research focuses on computer vision, neural rendering, and deep learning applications for image reconstruction, geometric understanding, and generative modeling in multi-disciplinary domains.
Entropy-guided label pruning and region-aware uncertainty estimation enables fracture diagnosis models to reason under ambiguity.
Abstract:
Radiographic imaging is crucial for diagnosing bone fractures, but the absence of reliable uncertainty measures in existing models complicates the interpretation of ambiguous cases, especially in complex or noisy datasets. Current deep learning methods for fracture diagnosis such as convolutional neural networks (CNNs) and Transformers have improved detection accuracy, but typically rely on deterministic outputs that fail to estimate predictive confidence or handle mislabeled samples. We introduce BONE-ULR, a Bayesian approach for bone fracture diagnosis that integrates adaptive label refinement and spatially-aware uncertainty estimation to enhance both reliability and interpretability. Unlike existing approaches which provide binary predictions without insight into their confidence or failure cases, BONE-ULR produces a predictive distribution via multiple stochastic forward passes, enabling effective quantification of epistemic uncertainty and identification of ambiguous regions. Additionally, we introduce a dynamic label refinement strategy that ranks training samples by entropy and excludes high-uncertainty bone X-rays from supervision, mitigating the impact of mislabels and ambiguous fracture patterns while improving representation learning. Extensive experimental analysis validates that our approach significantly improves classification accuracy and calibration, achieving an F1-score of 85.91% and an Expected Calibration Error (ECE) of 0.141, surpassing state-of-the-art (SOTA) methods in both performance and reliability.
Fusing spatial detail with frequency-guided attention cues enables perceptual underwater image enhancement across color-distorted environments.
Abstract:
Underwater images suffer from severe degradations, including color distortions, reduced visibility, and loss of structural details due to wavelength-dependent attenuation and scattering. Existing enhancement methods primarily focus on spatial-domain processing, neglecting the frequency domain's potential to capture global color distributions and long-range dependencies. To address these limitations, we propose FUSION, a dual-domain deep learning framework that jointly leverages spatial and frequency domain information. FUSION independently processes each RGB channel through multi-scale convolutional kernels and adaptive attention mechanisms in the spatial domain, while simultaneously extracting global structural information via FFT-based frequency attention. A Frequency Guided Fusion module integrates complementary features from both domains, followed by inter-channel fusion and adaptive channel recalibration to ensure balanced color distributions. Extensive experiments on benchmark datasets (UIEB, EUVP, SUIM-E) demonstrate that FUSION achieves state-of-the-art performance, consistently outperforming existing methods in reconstruction fidelity (highest PSNR of 23.717 dB and SSIM of 0.883 on UIEB), perceptual quality (lowest LPIPS of 0.112 on UIEB), and visual enhancement metrics (best UIQM of 3.414 on UIEB), while requiring significantly fewer parameters (0.28M) and lower computational complexity, demonstrating its suitability for real-time underwater imaging applications.
Enforcing physical surface properties through PDE constraints yields geometrically accurate neural scene representations from sparse views.
Abstract:
Neural Radiance Fields (NeRFs) have transformed novel view synthesis, but achieving accurate surface geometry remains challenging, especially with sparse views. Current approaches either require dense viewpoint sampling or produce inconsistent geometry due to their reliance on purely photometric supervision. We introduce PDE-NeRF, a physics-informed optimization framework that enforces geometric consistency through Partial Differential Equation (PDE)-constrained density gradients. Unlike existing methods that use auxiliary losses, our approach directly shapes the underlying density field by aligning spatial derivatives with ground-truth surface normals. We combine this with an efficient hash-based encoding scheme to enable high-fidelity reconstruction from sparse views. Our model achieves up to 8.51 dB higher PSNR and a 0.062 reduction in LPIPS over NeRF-based baselines, demonstrating superior perceptual quality. PDE-NeRF maintains consistent surface details, even in areas visible from only 30–40 views in a 360-degree capture, effectively addressing a key challenge in neural scene reconstruction. Experiments on diverse synthetic scenes demonstrate that our method achieves superior geometric accuracy and visual quality, outperforming existing approaches across both 360° inward-facing and LLFF datasets.
Structuring attention through multi-scale graphs enable transformers to reason across visual hierarchies.
Abstract:
Vision Transformers (ViTs) have redefined image classification by leveraging self-attention to capture complex patterns and long-range dependencies between image patches. However, a key challenge for ViTs is efficiently incorporating multi-scale feature representations, which is inherent in convolutional neural networks (CNNs) through their hierarchical structure. Graph transformers have made strides in addressing this by leveraging graph-based modeling, but they often lose or insufficiently represent spatial hierarchies, especially since redundant or less relevant areas dilute the image's contextual representation. To bridge this gap, we propose SAG-ViT, a Scale-Aware Graph Attention ViT that integrates multi-scale feature capabilities of CNNs, representational power of ViTs, and graph-attended patching to enable richer contextual representation. Using EfficientNetV2 as a backbone, the model extracts multi-scale feature maps, dividing them into patches to preserve richer semantic information compared to directly patching the input images. The patches are structured into a graph using spatial and feature similarities, where a Graph Attention Network (GAT) refines the node embeddings. This refined graph representation is then processed by a Transformer encoder, capturing long-range dependencies and complex interactions. We evaluate SAG-ViT on benchmark datasets across various domains, validating its effectiveness in advancing image classification tasks. Our code and weights are available at https://github.com/shravan-18/SAG-ViT.
BibTeX:
@misc{SAGViT,
title={SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers},
author={Shravan Venkatraman and Jaskaran Singh Walia and Joe Dhanith P R},
year={2025},
eprint={2411.09420},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.09420},
}
Bridging local-global brain patterns and transforming disconnected MRI patches into spatially-coherent disease markers through residual graphs.
Abstract:
Neurodegenerative (ND) diseases are autoimmune diseases that affect the central nervous system, including the brain and spinal cord. In recent years, deep learning has demonstrated its potential in medical imaging for diagnostic purposes. However, for these techniques to be fully accepted in clinical settings, they must achieve high performance and gain the confidence of medical professionals regarding their interpretability. Therefore, an interpretable model should make decisions based on clinically relevant information like a domain expert. To achieve this, we present an interpretable classifier dedicated to the most common autoimmune ND diseases. The lesions associated with ND diseases exhibit irregular distributions and spatial dependencies in different regions of the brain, challenging traditional models to effectively capture both local and global relationships. To address this issue, we present a Residual Graph Neural Network enhanced Vision Transformer (RG-ViT) that represents MRI data as a graph of interconnected patches. By integrating residual connections into the GNN framework, we preserve critical features while promoting effective message passing. This approach overcomes the problem of spatial disconnection prevalent in standard patch-based methods and provides a cohesive and context-aware analysis of MRI data. Experimental results in detecting multiple sclerosis (MS), Parkinson's (PD), and Alzheimer's disease (AD) demonstrated our approach's consistent accuracy scores of 98.7%, 99.6%, and 99.1%, respectively. On the combined dataset for the global classification of ND diseases, it achieved an F1 score of 99.2%, justifying its generalizability.
Augmenting encoder-decoder architectures with attention-guided feature extraction helps in highly effective localization, segmentation, and classification of brain tumors.
Abstract:
Tumors in the brain are caused by abnormal growths in brain tissue resulting from different types of brain cells. If undiagnosed, they lead to aggressive neurological deficits, including cognitive impairment, motor dysfunction, and sensory loss. With the growth of the tumor, intracranial pressure will definitely increase, and this may bring about such dramatic complications as herniation of the brain, which may be fatal. Hence, early diagnosis and treatment are required to control the complications arising due to such tumors to retard their growth. Several works related to deep learning (DL) and artificial intelligence (AI) are being conducted to help doctors diagnose at an early stage by using the scans taken from Magnetic Resonance Imaging (MRI). Our research proposes targeted neural architectures within multi-objective frameworks that can localize, segment, and classify the grade of these gliomas from multimodal MRI images to solve this critical issue. Our localization framework utilizes a targeted architecture that enhances the LinkNet framework with an encoder inspired by VGG19 for better multimodal feature extraction from the tumor along with spatial and graph attention mechanisms that sharpen feature focus and inter-feature relationships. For the segmentation objective, we deployed a specialized framework using the SeResNet101 CNN model as the encoder backbone integrated into the LinkNet architecture, achieving an IoU Score of 96%. The classification objective is addressed through a distinct framework implemented by combining the SeResNet152 feature extractor with Adaptive Boosting classifier, reaching an accuracy of 98.53%. Our multi-objective approach with targeted neural architectures demonstrated promising results for complete glioma characterization, with the potential to advance medical AI by enabling early diagnosis and providing more accurate treatment options for patients.
BibTeX:
@misc{v2024integrateddeeplearningframework,
title={An Integrated Deep Learning Framework for Effective Brain Tumor Localization, Segmentation, and Classification from Magnetic Resonance Images},
author={Pandiyaraju V and Shravan Venkatraman and Abeshek A and Aravintakshan S A and Pavan Kumar S and Madhan S},
year={2024},
eprint={2409.17273},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2409.17273},
}
Dynamic graph construction based on tissue-specific nuclear spatial distributions helps neural networks better understand heterogeneous histopathological structures.
Abstract:
Histopathological image analysis plays a critical role in disease diagnosis by identifying tissue structures and
cellular abnormalities. Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) have
shown promise in automating this process but face significant limitations. CNNs struggle with capturing
long-range spatial relationships due to their local receptive fields. Existing GNN solutions rely on static graph
construction methods, which are not adaptive to tissue heterogeneity. Toward addressing these, first, we
introduce an Adaptive Percentile-Oriented Graph Construction Framework that dynamically adapts the edge
formation towards nuclear spatial distributions for representing biologically meaningful tissue morphologies.
Second, we provide a complete statistical analysis and backing towards the extracted morphological nuclei
features. We then introduce UCGNN-H, a unified convolutional-graph neural network for histopathology,
which combines CNN-learned spatial features with graph-based morphological representations to better
classify tissues accurately and contextually. Lastly, we provide a complete morphological feature analysis
contrasting and highlighting essential attributes toward accurate and interpretable tissue characterization.
Bi-focal perspectives guide neural networks to focus on subtle brain abnormalities while granular feature extraction at multiple scales identify subtle
neurofibrillary tangles and amyloid plaques in MRI scans for accurate Alzheimer's detection.
Abstract:
Being the most commonly known neurodegeneration, Alzheimer's Disease (AD) is annually diagnosed in millions of patients. The present medical scenario still finds the exact diagnosis and classification of AD through neuroimaging data as a challenging task. Traditional CNNs can extract a good amount of low-level information in an image while failing to extract high-level minuscule particles, which is a significant challenge in detecting AD from MRI scans. To overcome this, we propose a novel Granular Feature Integration method to combine information extraction at different scales along with an efficient information flow, enabling the model to capture both broad and fine-grained features simultaneously. We also propose a Bi-Focal Perspective mechanism to highlight the subtle neurofibrillary tangles and amyloid plaques in the MRI scans, ensuring that critical pathological markers are accurately identified. Our model achieved an F1-Score of 99.31%, precision of 99.24%, and recall of 99.51%. These scores prove that our model is significantly better than the state-of-the-art (SOTA) CNNs in existence.
BibTeX:
@article{Venkatraman_2025,
title={Leveraging Bi-Focal Perspectives and Granular Feature Integration for Accurate Reliable Early Alzheimer’s Detection},
volume={13},
ISSN={2169-3536},
url={http://dx.doi.org/10.1109/ACCESS.2025.3540567},
DOI={10.1109/access.2025.3540567},
journal={IEEE Access},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Venkatraman, Shravan and Pandiyaraju, V. and Abeshek, A. and Kumar, S. Pavan and Aravintakshan, S. A.},
year={2025},
pages={28678–28692}
}
Dynamic spatial mapping and component-specific feature enhancement overcome boundary delineation challenges in breast ultrasound imaging.
Abstract:
Breast cancer is one of the leading causes of death globally, and thus there is an urgent need for early and accurate diagnostic techniques. Although ultrasound imaging is a widely used technique for breast cancer screening, it faces challenges such as poor boundary delineation caused by variations in tumor morphology and reduced diagnostic accuracy due to inconsistent image quality. To address these challenges, we propose novel Deep Learning (DL) frameworks for breast lesion segmentation and classification. We introduce a precision mapping mechanism (PMM) for a precision mapping and attention-driven LinkNet (PMAD-LinkNet) segmentation framework that dynamically adapts spatial mappings through morphological variation analysis, enabling precise pixel-level refinement of tumor boundaries. Subsequently, we introduce a component-specific feature enhancement module (CSFEM) for a component-specific feature-enhanced classifier (CSFEC-Net). Through a multi-level attention approach, the CSFEM magnifies distinguishing features of benign, malignant, and normal tissues. The proposed frameworks are evaluated against existing literature and a diverse set of state-of-the-art Convolutional Neural Network (CNN) architectures. The obtained results show that our segmentation model achieves an accuracy of 98.1%, an IoU of 96.9%, and a Dice Coefficient of 97.2%. For the classification model, an accuracy of 99.2% is achieved with F1-score, precision, and recall values of 99.1%, 99.3%, and 99.1%, respectively.
BibTeX:
@misc{v2025exploitingprecisionmappingcomponentspecific,
title={Exploiting Precision Mapping and Component-Specific Feature Enhancement for Breast Cancer Segmentation and Identification},
author={Pandiyaraju V and Shravan Venkatraman and Pavan Kumar S and Santhosh Malarvannan and Kannan A},
year={2025},
eprint={2407.02844},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2407.02844},
}
Cross-modal attention enables synchronized audio-visual feature extraction through Transformer fusion for emotion recognition.
Abstract:
Understanding emotions is a fundamental aspect of human communication. Integrating audio and video signals offers a more comprehensive understanding of emotional states compared to traditional methods that rely on a single data source, such as speech or facial expressions. Despite its potential, multimodal emotion recognition faces significant challenges, particularly in synchronization, feature extraction, and fusion of diverse data sources. To address these issues, this paper introduces a novel transformer-based model named Audio-Video Transformer Fusion with Cross Attention (AVT-CA). The AVT-CA model employs a transformer fusion approach to effectively capture and synchronize interlinked features from both audio and video inputs, thereby resolving synchronization problems. Additionally, the Cross Attention mechanism within AVT-CA selectively extracts and emphasizes critical features while discarding irrelevant ones from both modalities, addressing feature extraction and fusion challenges. Extensive experimental analysis conducted on the CMU-MOSEI, RAVDESS and CREMA-D datasets demonstrates the efficacy of the proposed model. The results underscore the importance of AVT-CA in developing precise and reliable multimodal emotion recognition systems for practical applications.
BibTeX:
@misc{r2025multimodalemotionrecognitionusing,
title={Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention},
author={Joe Dhanith P R and Shravan Venkatraman and Vigya Sharma and Santhosh Malarvannan and Modigari Narendra},
year={2025},
eprint={2407.18552},
archivePrefix={arXiv},
primaryClass={cs.MM},
url={https://arxiv.org/abs/2407.18552},
}
Attention-fused deep convolutional neural networks improve the ability to classify diverse traffic signs through parallel hierarchical and multi-scale feature emphasis.
Abstract:
Autonomous vehicular technology, also known as self-driving or driverless technology, refers to the innovation that enables vehicles to operate without human intervention. Traffic sign classification (TSC) is a critical component in autonomous vehicular technology, as it allows vehicles to recognize and interpret traffic signs, which is essential for safe and rule-compliant navigation. This work proposes a novel attention-fused deep convolutional neural network (AFDCNN) for TSC. The proposed AFDCNN incorporates the capabilities of ResNet50 and EfficientNetV2 by combining their outputs through a self-attention mechanism which enhances its ability to classify traffic signs. Analysis of the GTSRB, LISA, and MASTIF datasets revealed that the proposed model exhibited superior performance compared to state-of-the-art models, as evidenced by higher scores in recall, precision, F1-score, and accuracy metrics.
BibTeX:
@INPROCEEDINGS{10654469,
author={Venkatraman, Shravan and Abeshek, A and Malarvannan, Santhosh and Shriyans, A and Jashwanth, R and Joe Dhanith, P R},
booktitle={2024 8th International Conference on Robotics and Automation Sciences (ICRAS)},
title={Traffic Sign Classification Using Attention Fused Deep Convolutional Neural Network},
year={2024},
volume={},
number={},
pages={90-94},
doi={10.1109/ICRAS62427.2024.10654469}
}
Physically-grounded weather conditioning through GANs combined with adaptive Transformer tokenization
preserves high-frequency sign details under adverse conditions.
Abstract:
Advancements in Traffic Sign Classification (TSC) systems remain critical for autonomous vehicle navigation, particularly under environmental perturbations that degrade
visual perception. While existing Vision Transformers (ViTs) demonstrate strong global context modeling, their effectiveness diminishes under (1) insufficient localized
feature resolution for low-resolution traffic sign recognition under weather degradation, and (2) dataset bias in real-world collections where adverse weather samples
exhibit limited parametric diversity (such as fixed fog density levels in CCTSDB2021). To address these challenges, we present: (1) a Weather Conditioning Module-Enhanced
DCGAN (WCM-DCGAN) that generates physically grounded weather transformations through parametric operators (\(\sigma\)-scaled fog density, \(\gamma\)-adjusted light diffusion),
and (2) an Xception-ViT hybrid (X-ViT) architecture combining dilated convolutions with patch-adaptive tokenization for localization precision by preserving high-frequency
sign details and minimizing information loss during patching and tokenization. Evaluated across five augmented datasets (GTSRB, MASTIF, TSRD, LISA, STSD) with 200K samples
each, X-ViT achieves consistent validation F1 scores of 0.996-0.999 compared to 0.771-0.982 for conventional models, demonstrating 2.8-14.7\% accuracy improvements in
adversarial weather conditions over vanilla ViT and ResNet50 baselines. Our work offers a more resilient solution for cross-environment generalization without requiring
expanded real-world data collection for autonomous vehicles. The code and produced augmented data are available at
this URL.
Ensemble deep learning with optimized weighted gradient techniques enables early and accurate detection of tomato leaf diseases.
Abstract:
Tomatoes are one of the most widely consumed and economically significant food crops globally, with their yield quality and quantity significantly impacted by various diseases. Early disease detection is crucial to reduce their effects and improve crop yields, supporting farmers. While previous research has applied machine learning techniques to segment and classify tomato leaf images, existing classifiers often struggle with accurately detecting new disease types. This study proposes a novel approach to tomato leaf disease classification by harnessing the power of deep learning and swarm intelligence-based optimization techniques. An ensemble model is developed, integrating an exponential moving average function with temporal constraints and an enhanced weighted gradient optimizer into fine-tuned Visual Geometry Group-16 (VGG-16) and Neural Architecture Search Network (NASNet) mobile training methods. The model is trained and validated on a dataset of 10,000 tomato leaf images categorized into nine disease classes, with an additional 1,000 images reserved for testing. The results show superior performance across key metrics: accuracy (98.7%), loss (4%), precision (97.9%), recall (98.6%), receiver operating characteristic curve (99.97%), and F1-score (98.7%), outperforming existing methods and enhancing disease detection accuracy.
BibTeX:
@ARTICLE{10.3389/fpls.2024.1382416,
AUTHOR={V., Pandiyaraju and Kumar, A. M. Senthil and Praveen, Joe I. R. and Venkatraman, Shravan and Kumar, S. Pavan and Aravintakshan, S. A. and Abeshek, A. and Kannan, A. },
TITLE={Improved tomato leaf disease classification through adaptive ensemble models with exponential moving average fusion and enhanced weighted gradient optimization},
JOURNAL={Frontiers in Plant Science},
VOLUME={15},
YEAR={2024},
URL={https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1382416},
DOI={10.3389/fpls.2024.1382416},
ISSN={1664-462X},
}