AdaLL

Abstract

Lifelong learning (LL) aims to continuously acquire new knowledge while retaining previously learned knowledge. A central challenge in LL is the stability-plasticity dilemma, which requires models to balance the preservation of previous knowledge (stability) with the ability to learn new tasks (plasticity). While parameter-efficient fine-tuning (PEFT) has been widely adopted in large language models, its application to lifelong learning remains underexplored. To bridge this gap, this paper proposes AdaLL, an adapter-based framework designed to address the dilemma through a simple, universal, and effective strategy. AdaLL co-trains the backbone network and adapters under regularization constraints, enabling the backbone to capture task-invariant features while allowing the adapters to specialize in task-specific information. Unlike methods that freeze the backbone network, AdaLL incrementally enhances the backbone's capabilities across tasks while minimizing interference through backbone regularization. This architectural design significantly improves both stability and plasticity, effectively eliminating the stability-plasticity dilemma. Extensive experiments demonstrate that AdaLL consistently outperforms existing methods across various configurations, including dataset choices, task sequences, and task scales.

Methodology

Method Overview

Existing methods contribute to incremental learning in various ways: (a) classic Regularization-based methods, such as LwF and EWC, introduce regularization constraints to preserve knowledge from previous tasks; (b) Submodule-based approaches, for instance InfLoRA and SideTuning, integrate additional components such as MLPs and LoRA into the backbone network; and (c) Prompt tuning methods (DualPrompt, CodaPrompt, etc.) introduces task-specific prefixes to the key and value in attention modules for task-specific performance. We anticipate a novel framework (d) that uses submodules in a way that it can be benefited from regularization and other backbone-specific algorithms to ensure a better response to the stability-plasticity dilemma, i.e. a universal solution.

AdaLL Introduction

We deploy an adapter that consists of the down-projection, the nonlinear transformation, up-projection, and skip-connection between the feature exactor and the classifier head. The key difference between traditional use of adapter and ours is that we co-train adapter with the entire network when learning a new task, in which way we add constraints to ensure that the backbone keeps the task-invariant information and adapters learn the task-specific information.

Go to Top

Results on Efficacy

Method Name	Average Task Accuracy↑	Final Task Accuracy↑
ResNet backbones
EWC	54.8	50.8
EWC-A	55.6	52.7
LwF	70.9	71.5
LwF-A	73.8	72.3
iTAML	79.0	80.5
iTAML-A	84.1	80.6
ViT backbones
DualPrompt	88.2	86.3
DualPrompt-A	89.3	87.9
InfLoRA	91.6	86.7
LwF-A	90.9	86.4

Our framework improves the absolute performance of incremental learning methods,
especially the modern ones. (See Section 4.4 in our paper for the discussion between
AdaLL and InfLoRA).

Go to Top

Results on Universality

Our framework improves the performance of different regularizations on different task orderings, task scales and datasets, demonstrating universality.

The following figure: models' performance on CIFAR-100 with different kinds of regularizations and orderings (the alphabetical order, the iCaRL order, and the coarse order based on task metadata).

Models' performance on CIFAR-100 with different kinds of regularizations and task scales (5, 10, 20 classes for each task).

Models' performance on CIFAR-100 with different kinds of regularizations on ImageNet-subset (The first 100 classes).

Lifelong Learning with Task-Specific Adaptation: Addressing the Stability-Plasticity Dilemma