Publications

* indicates equal contribution \(\dagger\) denotes my role as mentor

2025
	TIDE: Two-Stage Inverse Degradation Estimation with Guided Prior Disentanglement for Underwater Image Restoration Shravan Venkatraman, Rakesh Raj M, Pavan Kumar S, Chandrakala S Submitted: Proceedings of the International Conference on Computer Vision (ICCV) Workshops* abs / code and paper: post acceptance Two-stage framework that adaptively restores underwater images by identifying local degradation patterns and applying specialized corrections through inverse degradation mapping and progressive refinement. Abstract: To be released... BibTeX: NEED TO ADD BIBTEX
	UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography Shravan Venkatraman, Pavan Kumar S, Rakesh Raj M, Chandrakala S Proceedings of the International Conference on Computer Vision (ICCV) Workshops* project page / paper / code / abs / bibtex Guiding CT image classification by leveraging uncertainty estimates to focus analysis on ambiguous regions through progressive, multi-scale refinement. Abstract: Accurate classification of computed tomography (CT) images is essential for diagnosis and treatment planning, but existing methods often struggle with the subtle and spatially diverse nature of pathological features. Current approaches typically process images uniformly, limiting their ability to detect localized abnormalities that require focused analysis. We introduce UGPL, an uncertainty-guided progressive learning framework that performs a global-to-local analysis by first identifying regions of diagnostic ambiguity and then conducting detailed examination of these critical areas. Our approach employs evidential deep learning to quantify predictive uncertainty, guiding the extraction of informative patches through a non-maximum suppression mechanism that maintains spatial diversity. This progressive refinement strategy, combined with an adaptive fusion mechanism, enables UGPL to integrate both contextual information and fine-grained details. Experiments across three CT datasets demonstrate that UGPL consistently outperforms state-of-the-art methods, achieving improvements of 3.29%, 2.46%, and 8.08% in accuracy for kidney abnormality, lung cancer, and COVID-19 detection, respectively. Our analysis shows that the uncertainty-guided component provides substantial benefits, with performance dramatically increasing when the full progressive learning pipeline is implemented. BibTeX: @InProceedings{UGPL2025, author = {Venkatraman, Shravan and Kumar S, Pavan and Raj, Rakesh and S, Chandrakala}, title = {UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025} }
	PCM-NeRF: Probabilistic Camera Modeling for Neural Radiance Fields under Pose Uncertainty Shravan Venkatraman, Rakesh Raj M, Pavan Kumar S* Submitted: The 36th British Machine Vision Conference (BMVC 2025) abs / project page / code and paper: post acceptance Explicitly modeling camera poses as probability distributions with learnable uncertainties rather than fixed points in SE(3) achieves high-quality reconstruction even with significant pose errors. Abstract: To be released... BibTeX: NEED TO ADD BIBTEX
	Can We Go Beyond Visual Features? Neural Tissue Relation Modeling for Relational Graph Analysis in Non-Melanoma Skin Histology Shravan Venkatraman, Muthu Subash Kavitha, Joe Dhanith P R, V Manikandarajan, Jia Wu, Submitted: Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2025 Workshops abs / paper: post acceptance Neural encoding of inter-tissue dependencies enables structurally coherent predictions in boundary-dense regions for histopathology segmentation. Abstract: To be released... BibTeX: NEED TO ADD BIBTEX
	Rethinking Knowledge Retrieval for Generation: A Survey on RAG Architectures and Applications Meghana Sunil, Shravya V, Shravan Venkatraman \(^{\dagger}\), Joe Dhanith P R Submitted: Proceedings of the IEEE abs / paper: post acceptance Survey of modular Retrieval-Augmented Generation frameworks that improve LLM reliability, grounding, and controllability in open-domain tasks. Abstract: Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across natural language tasks but remain fundamentally limited by their static knowledge and susceptibility to hallucinations, especially in domains requiring up-to-date or attribute-grounded information. Retrieval-Augmented Generation (RAG) addresses these challenges by integrating external retrieval mechanisms with generative models, enabling dynamic, context-aware generation grounded in verifiable data sources. This survey presents a comprehensive examination of RAG as a modular and evolving paradigm that enhances factual reliability, adaptability, and task alignment in LLM-based systems. We formalize the RAG framework through its three foundational components - retrieval, generation, and augmentation - and survey state-of-the-art methods spanning dense and sparse retrievers, fusion strategies, embedding optimizations, and reinforcement learning-based retrieval policies. Anchored around four emerging axes - efficiency, security, user-centric interactivity, and complex reasoning - we categorize recent innovations and highlight their implications for scalability, robustness, and personalization. The paper also reviews advances in evaluation protocols, domain-specific applications, and architectural variants such as Naïve RAG, Advanced RAG, and Modular RAG. Finally, we identify persistent challenges and outline future directions aimed at advancing the integration of retrieval with LLMs for more grounded, interpretable, and controllable generation. BibTeX: NEED TO ADD BIBTEX
	A Lightweight Continual Learning Approach via Retrieval-Augmented Generation for Personalized AI Assistants Shravan Venkatraman, Pavan Kumar S, Jayasankar K S, Meghana Sunil, Gowri Ajith, Santhosh Malarvannan, Joe Dhanith P R In Progress abs / paper: post acceptance A lightweight continual learning pipeline for efficient RAG workflows in AI agents. Abstract: CL abstract to be added. BibTeX: NEED TO ADD BIBTEX
	SPROUT: Symptom-centric Prototypical Representation Optimization and Uncertainty-aware Tuning for Few-Shot Precision Agriculture Shravan Venkatraman, Pavan Kumar S, Pandiyaraju V, Abeshek A, Aravintakshan S A, Kannan A Submitted: Engineering Applications of Artificial Intelligence abs / code & paper: post acceptance Dynamically weighting symptom-representative samples enhances few-shot plant disease recognition in regionally diverse scenarios. Abstract: Plant disease detection using computer vision has significantly advanced precision agriculture, but adapting to new disease variants with minimal labeled examples remains a challenge, particularly when regional factors influence symptom expression. Current approaches primarily rely on transfer learning or data augmentation techniques, which often fail to adequately capture the intricate visual characteristics specific to plant diseases from limited examples. We introduce SPROUT (Symptom-centric Prototypical Representation Optimization and Uncertainty-aware Tuning), a few-shot learning framework that leverages an attention-weighted prototype refinement mechanism to identify and emphasize the most informative disease features in support examples.
	Bayesian Uncertainty Propagation for Bone Fracture Diagnosis via Region-Aware Adaptive Label Refinement Shravan Venkatraman, Pandiyaraju V, Abeshek A, Pavan Kumar S, Aravintakshan S A, Kannan A Submitted: Knowledge Based Systems abs / code & paper: post acceptance Entropy-guided label pruning and region-aware uncertainty estimation enables fracture diagnosis models to reason under ambiguity. Abstract: Radiographic imaging is crucial for diagnosing bone fractures, but the absence of reliable uncertainty measures in existing models complicates the interpretation of ambiguous cases, especially in complex or noisy datasets. Current deep learning methods for fracture diagnosis such as convolutional neural networks (CNNs) and Transformers have improved detection accuracy, but typically rely on deterministic outputs that fail to estimate predictive confidence or handle mislabeled samples. We introduce BONE-ULR, a Bayesian approach for bone fracture diagnosis that integrates adaptive label refinement and spatially-aware uncertainty estimation to enhance both reliability and interpretability. Unlike existing approaches which provide binary predictions without insight into their confidence or failure cases, BONE-ULR produces a predictive distribution via multiple stochastic forward passes, enabling effective quantification of epistemic uncertainty and identification of ambiguous regions. Additionally, we introduce a dynamic label refinement strategy that ranks training samples by entropy and excludes high-uncertainty bone X-rays from supervision, mitigating the impact of mislabels and ambiguous fracture patterns while improving representation learning. Extensive experimental analysis validates that our approach significantly improves classification accuracy and calibration, achieving an F1-score of 85.91% and an Expected Calibration Error (ECE) of 0.141, surpassing state-of-the-art (SOTA) methods in both performance and reliability. BibTeX: NEED TO ADD BIBTEX
	FUSION: Frequency-guided Underwater Spatial Image recOnstructioN Jaskaran Singh Walia, Shravan Venkatraman, Pavithra L K Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops paper / code / project page / abs / bibtex Fusing spatial detail with frequency-guided attention cues enables perceptual underwater image enhancement across color-distorted environments. Abstract: Underwater images suffer from severe degradations, including color distortions, reduced visibility, and loss of structural details due to wavelength-dependent attenuation and scattering. Existing enhancement methods primarily focus on spatial-domain processing, neglecting the frequency domain's potential to capture global color distributions and long-range dependencies. To address these limitations, we propose FUSION, a dual-domain deep learning framework that jointly leverages spatial and frequency domain information. FUSION independently processes each RGB channel through multi-scale convolutional kernels and adaptive attention mechanisms in the spatial domain, while simultaneously extracting global structural information via FFT-based frequency attention. A Frequency Guided Fusion module integrates complementary features from both domains, followed by inter-channel fusion and adaptive channel recalibration to ensure balanced color distributions. Extensive experiments on benchmark datasets (UIEB, EUVP, SUIM-E) demonstrate that FUSION achieves state-of-the-art performance, consistently outperforming existing methods in reconstruction fidelity (highest PSNR of 23.717 dB and SSIM of 0.883 on UIEB), perceptual quality (lowest LPIPS of 0.112 on UIEB), and visual enhancement metrics (best UIQM of 3.414 on UIEB), while requiring significantly fewer parameters (0.28M) and lower computational complexity, demonstrating its suitability for real-time underwater imaging applications. BibTeX: @InProceedings{FUSION, author = {Jaskaran Singh Walia and Shravan Venkatraman and Pavithra LK}, title = {FUSION: Frequency-guided Underwater Spatial Image recOnstructioN}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025} }
	Making NeRF See Structure, Not Just Light: Enforcing PDE-Based Surface Constraints for 3D Consistency Shravan Venkatraman, Pandiyaraju V Submitted: ACM Transactions on Graphics abs / code & paper: post acceptance Enforcing physical surface properties through PDE constraints yields geometrically accurate neural scene representations from sparse views. Abstract: Neural Radiance Fields (NeRFs) have transformed novel view synthesis, but achieving accurate surface geometry remains challenging, especially with sparse views. Current approaches either require dense viewpoint sampling or produce inconsistent geometry due to their reliance on purely photometric supervision. We introduce PDE-NeRF, a physics-informed optimization framework that enforces geometric consistency through Partial Differential Equation (PDE)-constrained density gradients. Unlike existing methods that use auxiliary losses, our approach directly shapes the underlying density field by aligning spatial derivatives with ground-truth surface normals. We combine this with an efficient hash-based encoding scheme to enable high-fidelity reconstruction from sparse views. Our model achieves up to 8.51 dB higher PSNR and a 0.062 reduction in LPIPS over NeRF-based baselines, demonstrating superior perceptual quality. PDE-NeRF maintains consistent surface details, even in areas visible from only 30–40 views in a 360-degree capture, effectively addressing a key challenge in neural scene reconstruction. Experiments on diverse synthetic scenes demonstrate that our method achieves superior geometric accuracy and visual quality, outperforming existing approaches across both 360° inward-facing and LLFF datasets. BibTeX: NEED TO ADD BIBTEX
	SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers Shravan Venkatraman, Jaskaran Singh Walia, Joe Dhanith P R Submitted: Complex and Intelligent Systems code / paper / Hugging Face / abs / bibtex Structuring attention through multi-scale graphs enable transformers to reason across visual hierarchies. Abstract: Vision Transformers (ViTs) have redefined image classification by leveraging self-attention to capture complex patterns and long-range dependencies between image patches. However, a key challenge for ViTs is efficiently incorporating multi-scale feature representations, which is inherent in convolutional neural networks (CNNs) through their hierarchical structure. Graph transformers have made strides in addressing this by leveraging graph-based modeling, but they often lose or insufficiently represent spatial hierarchies, especially since redundant or less relevant areas dilute the image's contextual representation. To bridge this gap, we propose SAG-ViT, a Scale-Aware Graph Attention ViT that integrates multi-scale feature capabilities of CNNs, representational power of ViTs, and graph-attended patching to enable richer contextual representation. Using EfficientNetV2 as a backbone, the model extracts multi-scale feature maps, dividing them into patches to preserve richer semantic information compared to directly patching the input images. The patches are structured into a graph using spatial and feature similarities, where a Graph Attention Network (GAT) refines the node embeddings. This refined graph representation is then processed by a Transformer encoder, capturing long-range dependencies and complex interactions. We evaluate SAG-ViT on benchmark datasets across various domains, validating its effectiveness in advancing image classification tasks. Our code and weights are available at https://github.com/shravan-18/SAG-ViT. BibTeX: @misc{SAGViT, title={SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers}, author={Shravan Venkatraman and Jaskaran Singh Walia and Joe Dhanith P R}, year={2025}, eprint={2411.09420}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.09420}, }
	Hierarchical Graph-Guided Contextual Representation Learning for Neurodegenerative Pattern Recognition in MRI Shravan Venkatraman, Joe Dhanith P R, Muthu Subash Kavitha Submitted: Computers in Biology and Medicine abs / code & paper: post acceptance Bridging local-global brain patterns and transforming disconnected MRI patches into spatially-coherent disease markers through residual graphs. Abstract: Neurodegenerative (ND) diseases are autoimmune diseases that affect the central nervous system, including the brain and spinal cord. In recent years, deep learning has demonstrated its potential in medical imaging for diagnostic purposes. However, for these techniques to be fully accepted in clinical settings, they must achieve high performance and gain the confidence of medical professionals regarding their interpretability. Therefore, an interpretable model should make decisions based on clinically relevant information like a domain expert. To achieve this, we present an interpretable classifier dedicated to the most common autoimmune ND diseases. The lesions associated with ND diseases exhibit irregular distributions and spatial dependencies in different regions of the brain, challenging traditional models to effectively capture both local and global relationships. To address this issue, we present a Residual Graph Neural Network enhanced Vision Transformer (RG-ViT) that represents MRI data as a graph of interconnected patches. By integrating residual connections into the GNN framework, we preserve critical features while promoting effective message passing. This approach overcomes the problem of spatial disconnection prevalent in standard patch-based methods and provides a cohesive and context-aware analysis of MRI data. Experimental results in detecting multiple sclerosis (MS), Parkinson's (PD), and Alzheimer's disease (AD) demonstrated our approach's consistent accuracy scores of 98.7%, 99.6%, and 99.1%, respectively. On the combined dataset for the global classification of ND diseases, it achieved an F1 score of 99.2%, justifying its generalizability. BibTeX: NEED TO ADD BIBTEX
2024
	Targeted Neural Architectures in Multi-Objective Frameworks for Complete Glioma Characterization from Multimodal MRI Shravan Venkatraman Pandiyaraju V, Abeshek A, Aravintakshan S A, Pavan Kumar S, Kannan A, Madhan Submitted: Applied Soft Computing paper / abs / bibtex Augmenting encoder-decoder architectures with attention-guided feature extraction helps in highly effective localization, segmentation, and classification of brain tumors. Abstract: Tumors in the brain are caused by abnormal growths in brain tissue resulting from different types of brain cells. If undiagnosed, they lead to aggressive neurological deficits, including cognitive impairment, motor dysfunction, and sensory loss. With the growth of the tumor, intracranial pressure will definitely increase, and this may bring about such dramatic complications as herniation of the brain, which may be fatal. Hence, early diagnosis and treatment are required to control the complications arising due to such tumors to retard their growth. Several works related to deep learning (DL) and artificial intelligence (AI) are being conducted to help doctors diagnose at an early stage by using the scans taken from Magnetic Resonance Imaging (MRI). Our research proposes targeted neural architectures within multi-objective frameworks that can localize, segment, and classify the grade of these gliomas from multimodal MRI images to solve this critical issue. Our localization framework utilizes a targeted architecture that enhances the LinkNet framework with an encoder inspired by VGG19 for better multimodal feature extraction from the tumor along with spatial and graph attention mechanisms that sharpen feature focus and inter-feature relationships. For the segmentation objective, we deployed a specialized framework using the SeResNet101 CNN model as the encoder backbone integrated into the LinkNet architecture, achieving an IoU Score of 96%. The classification objective is addressed through a distinct framework implemented by combining the SeResNet152 feature extractor with Adaptive Boosting classifier, reaching an accuracy of 98.53%. Our multi-objective approach with targeted neural architectures demonstrated promising results for complete glioma characterization, with the potential to advance medical AI by enabling early diagnosis and providing more accurate treatment options for patients. BibTeX: @misc{v2024integrateddeeplearningframework, title={An Integrated Deep Learning Framework for Effective Brain Tumor Localization, Segmentation, and Classification from Magnetic Resonance Images}, author={Pandiyaraju V and Shravan Venkatraman and Abeshek A and Aravintakshan S A and Pavan Kumar S and Madhan S}, year={2024}, eprint={2409.17273}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2409.17273}, }
	Statistical and Multivariate Feature Selection with Dynamic Graph Learning and Domain-Informed Fusion for Histopathological Image Classification Shravan Venkatraman, Pandiyaraju V Submitted: Biomedical Signal Processing and Control abs / code and paper: post acceptance Dynamic graph construction based on tissue-specific nuclear spatial distributions helps neural networks better understand heterogeneous histopathological structures. Abstract: Histopathological image analysis plays a critical role in disease diagnosis by identifying tissue structures and cellular abnormalities. Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) have shown promise in automating this process but face significant limitations. CNNs struggle with capturing long-range spatial relationships due to their local receptive fields. Existing GNN solutions rely on static graph construction methods, which are not adaptive to tissue heterogeneity. Toward addressing these, first, we introduce an Adaptive Percentile-Oriented Graph Construction Framework that dynamically adapts the edge formation towards nuclear spatial distributions for representing biologically meaningful tissue morphologies. Second, we provide a complete statistical analysis and backing towards the extracted morphological nuclei features. We then introduce UCGNN-H, a unified convolutional-graph neural network for histopathology, which combines CNN-learned spatial features with graph-based morphological representations to better classify tissues accurately and contextually. Lastly, we provide a complete morphological feature analysis contrasting and highlighting essential attributes toward accurate and interpretable tissue characterization. BibTeX: NEED TO ADD BIBTEX
	Leveraging Bi-Focal Perspectives and Granular Feature Integration for Accurate and Reliable Early Alzheimer’s Detection Shravan Venkatraman, Pandiyaraju V, Abeshek A, Pavan Kumar S, Aravintakshan S A IEEE Access paper / abs / bibtex Bi-focal perspectives guide neural networks to focus on subtle brain abnormalities while granular feature extraction at multiple scales identify subtle neurofibrillary tangles and amyloid plaques in MRI scans for accurate Alzheimer's detection. Abstract: Being the most commonly known neurodegeneration, Alzheimer's Disease (AD) is annually diagnosed in millions of patients. The present medical scenario still finds the exact diagnosis and classification of AD through neuroimaging data as a challenging task. Traditional CNNs can extract a good amount of low-level information in an image while failing to extract high-level minuscule particles, which is a significant challenge in detecting AD from MRI scans. To overcome this, we propose a novel Granular Feature Integration method to combine information extraction at different scales along with an efficient information flow, enabling the model to capture both broad and fine-grained features simultaneously. We also propose a Bi-Focal Perspective mechanism to highlight the subtle neurofibrillary tangles and amyloid plaques in the MRI scans, ensuring that critical pathological markers are accurately identified. Our model achieved an F1-Score of 99.31%, precision of 99.24%, and recall of 99.51%. These scores prove that our model is significantly better than the state-of-the-art (SOTA) CNNs in existence. BibTeX: @article{AlzBiFocal, title={Leveraging Bi-Focal Perspectives and Granular Feature Integration for Accurate Reliable Early Alzheimer’s Detection}, volume={13}, ISSN={2169-3536}, url={http://dx.doi.org/10.1109/ACCESS.2025.3540567}, DOI={10.1109/access.2025.3540567}, journal={IEEE Access}, publisher={Institute of Electrical and Electronics Engineers (IEEE)}, author={Venkatraman, Shravan and Pandiyaraju, V. and Abeshek, A. and Kumar, S. Pavan and Aravintakshan, S. A.}, year={2025}, pages={28678–28692} }
	Exploiting Precision Mapping and Component-Specific Feature Enhancement for Breast Cancer Segmentation and Identification Pandiyaraju V, Shravan Venkatraman, Saraswathi D, Pavan Kumar S, Santhosh Malarvannan, Kannan A Submitted: Ain Shams Journal paper / abs / bibtex Dynamic spatial mapping and component-specific feature enhancement overcome boundary delineation challenges in breast ultrasound imaging. Abstract: Breast cancer is one of the leading causes of death globally, and thus there is an urgent need for early and accurate diagnostic techniques. Although ultrasound imaging is a widely used technique for breast cancer screening, it faces challenges such as poor boundary delineation caused by variations in tumor morphology and reduced diagnostic accuracy due to inconsistent image quality. To address these challenges, we propose novel Deep Learning (DL) frameworks for breast lesion segmentation and classification. We introduce a precision mapping mechanism (PMM) for a precision mapping and attention-driven LinkNet (PMAD-LinkNet) segmentation framework that dynamically adapts spatial mappings through morphological variation analysis, enabling precise pixel-level refinement of tumor boundaries. Subsequently, we introduce a component-specific feature enhancement module (CSFEM) for a component-specific feature-enhanced classifier (CSFEC-Net). Through a multi-level attention approach, the CSFEM magnifies distinguishing features of benign, malignant, and normal tissues. The proposed frameworks are evaluated against existing literature and a diverse set of state-of-the-art Convolutional Neural Network (CNN) architectures. The obtained results show that our segmentation model achieves an accuracy of 98.1%, an IoU of 96.9%, and a Dice Coefficient of 97.2%. For the classification model, an accuracy of 99.2% is achieved with F1-score, precision, and recall values of 99.1%, 99.3%, and 99.1%, respectively. BibTeX: @misc{BCcomponentSpecific, title={Exploiting Precision Mapping and Component-Specific Feature Enhancement for Breast Cancer Segmentation and Identification}, author={Pandiyaraju V and Shravan Venkatraman and Pavan Kumar S and Santhosh Malarvannan and Kannan A}, year={2025}, eprint={2407.02844}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2407.02844}, }
	Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention Joe Dhanith P R, Shravan Venkatraman, Vigya Sharma, Santhosh Malarvannan, Modigari Narendra Submitted: IEEE Transactions on Systems, Man, and Cybernetics: Systems code / paper / abs / bibtex Cross-modal attention enables synchronized audio-visual feature extraction through Transformer fusion for emotion recognition. Abstract: Understanding emotions is a fundamental aspect of human communication. Integrating audio and video signals offers a more comprehensive understanding of emotional states compared to traditional methods that rely on a single data source, such as speech or facial expressions. Despite its potential, multimodal emotion recognition faces significant challenges, particularly in synchronization, feature extraction, and fusion of diverse data sources. To address these issues, this paper introduces a novel transformer-based model named Audio-Video Transformer Fusion with Cross Attention (AVT-CA). The AVT-CA model employs a transformer fusion approach to effectively capture and synchronize interlinked features from both audio and video inputs, thereby resolving synchronization problems. Additionally, the Cross Attention mechanism within AVT-CA selectively extracts and emphasizes critical features while discarding irrelevant ones from both modalities, addressing feature extraction and fusion challenges. Extensive experimental analysis conducted on the CMU-MOSEI, RAVDESS and CREMA-D datasets demonstrates the efficacy of the proposed model. The results underscore the importance of AVT-CA in developing precise and reliable multimodal emotion recognition systems for practical applications. BibTeX: @misc{AVTCA, title={Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention}, author={Joe Dhanith P R and Shravan Venkatraman and Vigya Sharma and Santhosh Malarvannan and Modigari Narendra}, year={2025}, eprint={2407.18552}, archivePrefix={arXiv}, primaryClass={cs.MM}, url={https://arxiv.org/abs/2407.18552}, }
	Traffic Sign Classification Using Attention Fused Deep Convolutional Neural Networks Shravan Venkatraman, Abeshek A, Santhosh Malarvannan, Shriyans A, Jashwanth R, Joe Dhanith P R 8\(^{th}\) International Conference on Robotics and Automation Sciences (ICRAS) paper / abs / bibtex Attention-fused deep convolutional neural networks improve the ability to classify diverse traffic signs through parallel hierarchical and multi-scale feature emphasis. Abstract: Autonomous vehicular technology, also known as self-driving or driverless technology, refers to the innovation that enables vehicles to operate without human intervention. Traffic sign classification (TSC) is a critical component in autonomous vehicular technology, as it allows vehicles to recognize and interpret traffic signs, which is essential for safe and rule-compliant navigation. This work proposes a novel attention-fused deep convolutional neural network (AFDCNN) for TSC. The proposed AFDCNN incorporates the capabilities of ResNet50 and EfficientNetV2 by combining their outputs through a self-attention mechanism which enhances its ability to classify traffic signs. Analysis of the GTSRB, LISA, and MASTIF datasets revealed that the proposed model exhibited superior performance compared to state-of-the-art models, as evidenced by higher scores in recall, precision, F1-score, and accuracy metrics. BibTeX: @INPROCEEDINGS{ICRAS_TSC, author={Venkatraman, Shravan and Abeshek, A and Malarvannan, Santhosh and Shriyans, A and Jashwanth, R and Joe Dhanith, P R}, booktitle={2024 8th International Conference on Robotics and Automation Sciences (ICRAS)}, title={Traffic Sign Classification Using Attention Fused Deep Convolutional Neural Network}, year={2024}, volume={}, number={}, pages={90-94}, doi={10.1109/ICRAS62427.2024.10654469} }
2023
	Enhancing Traffic Sign Classification in Autonomous Vehicular Technology Using Weather-Conditioned Synthetic Data and Xception-Enhanced Vision Transformers Joe Dhanith P R, Shravan Venkatraman, Raja Soosaimarian Peter Raj, Abeshek A, Santhosh Malarvannan, Jashwanth R, Shriyans A Submitted: IEEE Transactions on Systems, Man, and Cybernetics: Systems code / paper / abs Physically-grounded weather conditioning through GANs combined with adaptive Transformer tokenization preserves high-frequency sign details under adverse conditions. Abstract: Advancements in Traffic Sign Classification (TSC) systems remain critical for autonomous vehicle navigation, particularly under environmental perturbations that degrade visual perception. While existing Vision Transformers (ViTs) demonstrate strong global context modeling, their effectiveness diminishes under (1) insufficient localized feature resolution for low-resolution traffic sign recognition under weather degradation, and (2) dataset bias in real-world collections where adverse weather samples exhibit limited parametric diversity (such as fixed fog density levels in CCTSDB2021). To address these challenges, we present: (1) a Weather Conditioning Module-Enhanced DCGAN (WCM-DCGAN) that generates physically grounded weather transformations through parametric operators (\(\sigma\)-scaled fog density, \(\gamma\)-adjusted light diffusion), and (2) an Xception-ViT hybrid (X-ViT) architecture combining dilated convolutions with patch-adaptive tokenization for localization precision by preserving high-frequency sign details and minimizing information loss during patching and tokenization. Evaluated across five augmented datasets (GTSRB, MASTIF, TSRD, LISA, STSD) with 200K samples each, X-ViT achieves consistent validation F1 scores of 0.996-0.999 compared to 0.771-0.982 for conventional models, demonstrating 2.8-14.7% accuracy improvements in adversarial weather conditions over vanilla ViT and ResNet50 baselines. Our work offers a more resilient solution for cross-environment generalization without requiring expanded real-world data collection for autonomous vehicles. The code and produced augmented data are available at this URL. BibTeX: NEED TO ADD BIBTEX
	Improved Tomato Leaf Disease Classification Through Adaptive Ensemble Models with Exponential Moving Average Fusion and Enhanced Weighted Gradient Optimization Pandiyaraju V, Senthil Kumar A M, Praveen Joe I R, Shravan Venkatraman, Pavan Kumar S, Aravintakshan S A, Abeshek A, Kannan A Frontiers in Plant Science paper / abs / bibtex Ensemble deep learning with optimized weighted gradient techniques enables early and accurate detection of tomato leaf diseases. Abstract: Tomatoes are one of the most widely consumed and economically significant food crops globally, with their yield quality and quantity significantly impacted by various diseases. Early disease detection is crucial to reduce their effects and improve crop yields, supporting farmers. While previous research has applied machine learning techniques to segment and classify tomato leaf images, existing classifiers often struggle with accurately detecting new disease types. This study proposes a novel approach to tomato leaf disease classification by harnessing the power of deep learning and swarm intelligence-based optimization techniques. An ensemble model is developed, integrating an exponential moving average function with temporal constraints and an enhanced weighted gradient optimizer into fine-tuned Visual Geometry Group-16 (VGG-16) and Neural Architecture Search Network (NASNet) mobile training methods. The model is trained and validated on a dataset of 10,000 tomato leaf images categorized into nine disease classes, with an additional 1,000 images reserved for testing. The results show superior performance across key metrics: accuracy (98.7%), loss (4%), precision (97.9%), recall (98.6%), receiver operating characteristic curve (99.97%), and F1-score (98.7%), outperforming existing methods and enhancing disease detection accuracy. BibTeX: @ARTICLE{10.3389/fpls.2024.1382416, AUTHOR={V., Pandiyaraju and Kumar, A. M. Senthil and Praveen, Joe I. R. and Venkatraman, Shravan and Kumar, S. Pavan and Aravintakshan, S. A. and Abeshek, A. and Kannan, A. }, TITLE={Improved tomato leaf disease classification through adaptive ensemble models with exponential moving average fusion and enhanced weighted gradient optimization}, JOURNAL={Frontiers in Plant Science}, VOLUME={15}, YEAR={2024}, URL={https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1382416}, DOI={10.3389/fpls.2024.1382416}, ISSN={1664-462X}, }