Optimizing Large Language Models for Resource-Constrained Environments: A Parameter-Efficient Approach Using QLoRA and Prompt Tuning

Shivay Shakti; Drishti Hajong; Priyanshi Dubey

Authors

Shivay Shakti ComScore, New Delhi
Drishti Hajong Indian Institute of Technology
Priyanshi Dubey Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College , Government Medical College

Keywords:

management, parameter-efficient fine-tuning, large language models, QLoRA, prompt tuning, resource-constrained environments, NLP, memory optimization, deployment cost reduction, text classification, quantization, low-rank adaptation

Abstract

As the deployment of AI solutions continues to grow, particularly in resource-constrained environments, the need for efficient and cost-effective methods becomes increasingly critical. Large Language Models (LLMs) present significant computational challenges that often make their deployment impractical for many real-world applications. This study evaluates parameter-efficient fine-tuning methods, specifically QLoRA and Prompt Tuning, in combination with DistilBERT, to address these challenges. Our combined approach achieved a 36.2% reduction in memory usage and a 50% reduction in inference costs while maintaining 87.75% accuracy compared to baseline models. The results demonstrate that stacking these techniques can provide multiplicative benefits in resource reduction without significant performance degradation, offering practical solutions for resource-constrained deployments.

Optimizing Large Language Models for Resource-Constrained Environments: A Parameter-Efficient Approach Using QLoRA and Prompt Tuning

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite