Family name:
Given name:
Quoted from a Chinese idiom 一馬平川, my full name means the flat ground that one can ride straight across and thus implies enjoying a smooth life.
眾峰來自天目山,勢若駿馬奔平川。
There are lots of mountains from Tianmu Mountain, which have the might of fine horses galloping across flat ground.

Working on any-to-any multimodal generation with diffusion alignment, advised by Shengzhi Li and Dr. Prayag Tiwari. Added VLM support in the transformers library for training pipelines. Modified NExT-GPT architecture to use FLUX instead of SDXL as image decoder.

Built AI agents for automated UI navigation and testing. Developed a framework to convert traditional scripts to AI-based robust UI navigation. Constructed a pipeline to heal failed testing scripts using AI.

Designed a plug-and-play Visual Grounding framework. Outperformed SOTA models and GPT-4V by mIoU of 0.11 & 0.48 on RefCOCOg. Extended the grounding framework to accept multiple modalities (image, video).

Built a pipeline to fetch visually semantic videos from 100K+ videos with sub-0.5s latency. Designed an AI module for cross-platform ad resizing. Fine-tuned inpainting models for text-based background generation.
@inproceedings{sharma2025think,
title={Think to Ground: Improving Spatial Reasoning in LLMs for better Visual Grounding},
author={Sharma, Karun and Vats, Vidushee},
booktitle={ICLR 2025 Workshop on Reasoning and Planning for Large Language Models},
year={2025}
}
@INPROCEEDINGS{10651096,
author={Sharma, Karun and Vats, Vidushee and Singh, Abhinendra and Sahani, Rahul and Rai, Deepak and Sharma, Ashok},
booktitle={2024 International Joint Conference on Neural Networks (IJCNN)},
title={LLaVA-PlantDiag: Integrating Large-scale Vision-Language Abilities for Conversational Plant Pathology Diagnosis},
year={2024},
pages={1-7},
doi={10.1109/IJCNN60899.2024.10651096}
}
@INPROCEEDINGS{10330945,
author={Gupta, Ujjwal and Golash, Roshan and Vats, Vidushee and Sharma, Karun},
booktitle={2023 International Conference on Emerging Techniques in Computational Intelligence (ICETCI)},
title={An Improved Hybrid Model for Target Detection},
year={2023},
pages={265-270},
doi={10.1109/ICETCI58599.2023.10330945}
}