RESEARCH OF PRE-TAINED NEURAL NETWORKS MODELS FOR TEXT GENERATION
Keywords:
text generation, neural networks, Transformer architecture, pre-trained model, tokenAbstract
Text generation is currently becoming one of the most popular machine learning technologies. Pre-trained models allow: to generate text as an answer to a question, to generate a subsequent word on the basis of an existing text, to generate meaningful text for communication channels, etc. The main information on the architecture of neural network models for text generation is presented mainly in English-language sources, in Russian-language literature there are no review articles on this topic, in this regard, the theoretical data are scattered. This article presents a review of existing modern models of neural networks for text generation. The Transformer architecture of neural network is considered: the principle of operation of this architecture, categories of Transformer neural network models are described and examples of problems solved by them are given. Text generation methods are considered: greedy search, beam search, temperature sampling and sampling with low-probability tokens limitation. A comparison of pre-trained neural network models for text generation was carried out. The study of pre-trained neural network models for text generation will help in determining which models are more preferable for a particular text generation task. Conducting this study will help in the future to determine the choice of the most appropriate pre-trained neural network model for the task of generating text for communication channels.
References
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. Attention is all you need // Advances in Neural Information Processing Systems 30. 2017. pp. 5998-6008.
Yang Z., Keung J., Yu X., Gu X., Wei Z., Ma X., Zhang M. A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts // The 2021 International Conference on Program Comprehension. 2021. pp. 1-12.
Juraska J., Walker M. Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG // Proceedings of the 14th International Conference on Natural Language Generation. 2021. pp. 416-431.
The Illustrated Transformer [Электронный ресурс]. – Режим доступа. – URL: http://jalammar.github.io/illustrated-transformer (дата обращения: 10.04.2023).
Lei Ba J., Kiros J.R., Hinton G.E. Layer Normalization. ArXiv. 2016. URL: https://arxiv.org/pdf/1607.06450.pdf (дата обращения: 11.04.2023).
Как устроена нейросеть BERT от Google [Электронный ресурс]. – Режим доступа. – URL: https://sysblok.ru/knowhow/kak-ustroena-nejroset-bert-ot-google (дата обращения: 17.04.2023).
Eco2AI: контроль углеродного следа моделей машинного обучения в качестве пер-вого шага к устойчивому искусственному интеллекту / С. А. Буденный, В. Д. Лазарев, Н. Н. Захаренко [и др.] // Доклады Российской академии наук. Математика, информатика, процес-сы управления. – 2022. – Т. 508, № 1. – С. 134-145.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.
Devlin J., Chang M., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo-gies, Volume 1 (Long and Short Papers). 2019. pp. 4171–4186.
Raffel C., Shazeer N., Roberts A., Lee K., Narang S., Matena M., Zhou Y., Li W., Liu P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer // Journal of Machine Learning Research, Volume 21. 2020. pp. 1-67.
Многозадачная модель T5 для русского языка [Электронный ресурс]. – Режим до-ступа. – URL: https://habr.com/ru/articles/581932 (дата обращения: 19.04.2023).
Васюнин, М. А. Технологии понимания естественного языка / М. А. Васюнин, А. А. Бахман // Искусственный интеллект в автоматизированных системах управления и обра-ботки данных: Сборник статей Всероссийской научной конференции. В 2-х томах, Москва, 27–28 апреля 2022 года. Том 2. – Москва: Московский государственный технический уни-верситет имени Н.Э. Баумана (национальный исследовательский университет), 2022. – С. 269-274.
Sanh V., Debut L., Chaumond J., Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv. 2019. URL: https://arxiv.org/abs/1910.01108 (дата об-ращения: 20.04.2023).
Hahn S., Choi H. Self-Knowledge Distillation in Natural Language Processing // Proceed-ings of the International Conference on Recent Advances in Natural Language Processing, Varna, Bulgaria, September 2-4, 2019. 2019. pp. 423-430.
Галеев, Д. Т. Экспериментальное исследование языковых моделей «трансформер» в задаче нахождения ответа на вопрос в русскоязычном тексте / Д. Т. Галеев, В. С. Панищев // Информатика и автоматизация. – 2022. – Т. 21, № 3. – С. 521-542.
How to generate text: using different decoding methods for language generation with Transformers [Электронный ресурс]. – Режим доступа. – URL: https://huggingface.co/blog/how-to-generate (дата обращения: 20.04.2023).
Holtzman A., Buys J., Du L., Forbes M., Choi Y. The Curious Case of Neural Text De-generation. ArXiv. 2019. URL: https://arxiv.org/abs/1904.09751 (дата обращения: 20.04.2023).