Article 연구/논문 

Optimizing Information Retrieval in Dark Web Academic Literature: A Study Using KeyBERT for Keyword Extraction and Clustering

조회수 48

Optimizing Information Retrieval in Dark Web Academic Literature: A Study Using KeyBERT for Keyword Extraction and Clustering



AUTHORS

Yosua Setyawan Soekamto1,4, Leonard Christopher Limanjaya1, Yoshua Kaleb Purwanto1,

Bongjun Choi2, Seung-Keun Song3, Dae-Ki Kang1,*


1Department of Computer Engineering, Dongseo University, Busan, South Korea

2Department of Software, Dongseo University, Busan, South Korea

3Department of Visual Contents, Graduate School, Dongseo University, Busan, South Korea

4Department of Information Systems, Universitas Ciputra Surabaya

yosua.soekamto@ciputra.ac.id, leonardchristopher002@gmail.com, yoshuakaleb049@gmail.com, bjchoi@gdsu.dongseo.ac.kr, songsk@gdsu.dongseo.ac.kr, *dkkang@dongseo.ac.kr



Abstract

The exponential increase in publications and the interconnected nature of sub-domains make traditional methods of information extraction and organization inadequate. This inefficiency can impede scientific progress and innovation. To address these challenges, this research leverages the ability of Bidirectional Encoder Representations from Transformers for keyword extraction (KeyBERT) and integrates with K-Means clustering to organize topics from large datasets effectively. Analyzing a dataset of 47,627 articles from SCOPUS in the domains of Reinforcement Learning and Computer Vision. An ablation study demonstrates the generalizability of the approach across these fields, with the optimal number of clusters determined to be three using the Elbow Method. The results demonstrate that KeyBERT is effective in extracting and organizing topics within these domains, with a particular focus on applications such as medical imaging, autonomous driving, and real-time detection systems. This methodology offers a scalable solution for organizing vast academic datasets, enabling researchers to extract meaningful insights efficiently and apply this approach to other domains.


Keywords : K-Means∣KeyBERT∣ Keyword Extraction∣ Text Mining∣ Topic Clustering 


전체 내용 보기