Tweet | |
T. Nakagawa, Y. Higo, and S. Kusumoto, "NIL: Large-Scale Detection of Large-Variance Clones," In Proceedings of The 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021), pp. 830-841, August 2021. | |
ID | 708 |
分類 | 国際会議 |
タグ | Clone Detection Large-Variance Clone Scalability |
表題 (title) |
NIL: Large-Scale Detection of Large-Variance Clones |
表題 (英文) |
|
著者名 (author) |
Tasuku Nakagawa,Yoshiki Higo,Shinji Kusumoto |
英文著者名 (author) |
Tasuku Nakagawa,Yoshiki Higo,Shinji Kusumoto |
編者名 (editor) |
|
編者名 (英文) |
|
キー (key) |
Tasuku Nakagawa,Yoshiki Higo,Shinji Kusumoto |
書籍・会議録表題 (booktitle) |
Proceedings of The 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021) |
書籍・会議録表題(英文) |
|
巻数 (volume) |
|
号数 (number) |
|
ページ範囲 (pages) |
830-841 |
組織名 (organization) |
|
出版元 (publisher) |
|
出版元 (英文) |
|
出版社住所 (address) |
|
刊行月 (month) |
8 |
出版年 (year) |
2021 |
採択率 (acceptance) |
|
URL |
|
付加情報 (note) |
|
注釈 (annote) |
|
内容梗概 (abstract) |
A code clone (in short, clone) is a code fragment that is identical or similar to other code fragments in source code. Clones generated by a large number of changes to copy-and-pasted code fragments are called large-variance (modifications are scattered) or large-gap(modifications are in one place) clones. It is difficult for general clone detection techniques to detect such clones and thus specialized techniques are necessary. In addition, with the rapid growth of software development, scalable clone detectors that can detect clones in large codebases are required. However, there are no existing techniques for quickly detecting large-variance or large-gap clones in large codebases. In this paper, we propose a scalable clone detection technique that can detect large-variance clones from large codebases and describe its implementation, called NIL. NIL is a token-based clone detector that efficiently identifies clone candidates using an N-gram representation of token sequences and an inverted index. Then, NIL verifies the clone candidates by measuring their similarity based on the longest common subsequence between their token sequences. We evaluate NIL in terms of large-variance clone detection accuracy, general Type-1, Type-2, and Type-3 clone detection accuracy, and scalability. Our experimental results show that NIL has higher accuracy in terms of large-variance clone detection, equivalent accuracy in terms of general clone detection, and the shortest execution time for inputs of various sizes (1–250 MLOC) compared to existing state-of-the-art tools. |
論文電子ファイル | preprint (application/pdf) [一般閲覧可] |
BiBTeXエントリ |
@inproceedings{id708, title = {{NIL}: Large-Scale Detection of Large-Variance Clones}, author = {Tasuku Nakagawa and Yoshiki Higo and Shinji Kusumoto}, booktitle = {Proceedings of The 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021)}, pages = {830-841}, month = {8}, year = {2021}, } |