Optimizing Large Language Models for Resource-Constrained Environments: A Parameter-Efficient Approach Using QLoRA and Prompt Tuning

Authors

Keywords:

management, parameter-efficient fine-tuning, large language models, QLoRA, prompt tuning, resource-constrained environments, NLP, memory optimization, deployment cost reduction, text classification, quantization, low-rank adaptation

Abstract

As the deployment of AI solutions continues to grow, particularly in resource-constrained environments, the need for efficient and cost-effective methods becomes increasingly critical. Large Language Models (LLMs) present significant computational challenges that often make their deployment impractical for many real-world applications. This study evaluates parameter-efficient fine-tuning methods, specifically QLoRA and Prompt Tuning, in combination with DistilBERT, to address these challenges. Our combined approach achieved a 36.2% reduction in memory usage and a 50% reduction in inference costs while maintaining 87.75% accuracy compared to baseline models. The results demonstrate that stacking these techniques can provide multiplicative benefits in resource reduction without significant performance degradation, offering practical solutions for resource-constrained deployments.

References

Downloads

Published

2025-10-05

Issue

Section

Articles

How to Cite

Optimizing Large Language Models for Resource-Constrained Environments: A Parameter-Efficient Approach Using QLoRA and Prompt Tuning. (2025). American Journal of Management, 25(4). https://articlearchives.co/index.php/AJM/article/view/7266