Kusumoto Laboratory: K. Hanayama, Humpback: Language Models-Based Code Completion System for Dockerfiles, February 2021.

Tweet
K. Hanayama, "Humpback: Language Models-Based Code Completion System for Dockerfiles," Master thesis, Osaka University, 2021.
ID	679
分類	学位論文
タグ	code completion dockerfiles humpback language models-based system
表題 (title)	Humpback: Language Models-Based Code Completion System for Dockerfiles
表題 (英文)
著者名 (author)	Kaisei Hanayama
英文著者名 (author)	Kaisei Hanayama
キー (key)	Kaisei Hanayama
学校名 (school)	Osaka University
出版社住所 (address)
刊行月 (month)	2
出版年 (year)	2021
URL
付加情報 (note)
注釈 (annote)
内容梗概 (abstract)	Server virtualization is broadly used for cost reduction and efficient resource utilization. Containerization, a type of virtualization technology, has become mainstream. Containerization creates multiple virtual servers (i.e., containers) on a single physical server. Each container provides an independent environment, enabling quicker delivery of applications, improved portability, and efficient resource utilization. The object of this study is Docker, the de facto standard containerization platform. Containers in Docker are built by writing configuration files called Dockerfiles. The process of managing infrastructure configuration through machine-readable definition files is called infrastructure as code (IaC). IaC makes it possible to prevent human errors, automate scaling, and apply knowledge gained from conventional software engineering to infrastructure configuration. However, IaC is a relatively new technology field; some domains of IaC have not been thoroughly researched, such as development support, static analysis, and establishing best practices. This study focuses on code completion, a widely used feature in software development, among unexplored technical areas of IaC. The goal of this study is to construct a code completion system that supports the development of Dockerfiles. The proposed system applies machine learning with long short-term memory to a pre-collected dataset and creates language models, which calculate probability distributions over a sequence of tokens. This system also employs model switching, a solution for a Docker-specific code completion problem. When creating containers in Docker, it is based on image files called base image. The Linux distribution differs depending on the base image, and the contents of the Dockerfile differ accordingly. Thus, model switching is introduced to reflect base image differences; language models for prediction are selected based on the base image. However, the Linux distribution cannot be identified by the base image name in some cases. A base image detector is also created to determine the Linux distribution in such cases. Humpback, a code completion system for \df, was implemented to realize the proposal of this study. 21,190 Dockerfiles were collected as the dataset for training and testing. Evaluation experiments were conducted to confirm how accurate Humpback is and verify that model switching improves code completion accuracy. Experiment results show that Humpback has a high average recommendation accuracy of 89.4%. The contribution of model switching to the improvement of its accuracy was also confirmed.
論文電子ファイル	k-hanaym_202102_mthesis.pdf (application/pdf) [一般閲覧可]
BiBTeXエントリ	@masterthesis{id679, title = {Humpback: Language Models-Based Code Completion System for Dockerfiles}, author = {Kaisei Hanayama}, school = {Osaka University}, month = {2}, year = {2021}, }