Karen Hambardzumyan

ACL-IJCNLP 2021 Accepted Paper

Karen Hambardzumyan YerevaNN, Yerevan State University [email protected]

Hrant Khachatrian YerevaNN, Yerevan State University [email protected]

Jonathan May Information Sciences Institute, USC [email protected]

WARP: Word-level Adversarial ReProgramming

@ ACL Anthology

WARP: Word-level Adversarial ReProgramming

The latest version of the paper

YerevaNN/WARP

Codebase

WARP_ACL2021_Slides.pdf

WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model. In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task. Using up to 25K trainable parameters per task, this approach outperforms all existing methods with up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks with just 32 training samples.

Results on SuperGLUE benchmark. The results for the test set are obtained from SuperGLUE evaluation server. We only show systems performing in a similar few-shot training setup using 32 examples.

Accuracy on SST-2 development and test sets. The last column shows the number of trainable parameters only. All methods use RoBERTa-large if not stated otherwise. WARP_K corresponds to a prompt consisting of K tokens.

Accuracy on SST-2 development and test sets. The last column shows the number of trainable parameters only. All methods use RoBERTa-large if not stated otherwise. WARP_K corresponds to a prompt consisting of K tokens.

Citation

@inproceedings{hambardzumyan-etal-2021-warp,
    title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming",
    author = "Hambardzumyan, Karen  and
      Khachatrian, Hrant  and
      May, Jonathan",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "<https://aclanthology.org/2021.acl-long.381>",
    doi = "10.18653/v1/2021.acl-long.381",
    pages = "4921--4933"
}