Detail Cantuman

Advanced Search

Artikel Jurnal

Development of Focused Crawlers for Building Large Punjabi News Corpus

Mahi Gurjot Singh - Personal Name

Abstract

Web crawlers are as old as the Internet and are most commonly used by search engines to visit websites and index them into repositories. They are not limited to search engines but are also widely utilized to build corpora in different domains and languages. This study developed a focused set of web crawlers for three Punjabi news websites. The web crawlers were developed to extract quality text articles and add them to a local repository to be used in further research. The crawlers were implemented using the Python programming language and were utilized to construct a corpus of more than 134,000 news articles in nine different news genres. The crawler code and extracted corpora were made publicly available to the scientific community for research purposes.

Ketersediaan

JICTRA6-001 JICTRA V15N3 December 2021 Perpustakaan FT UPI YAI Tersedia

Informasi Detil

Judul Seri	Journal of ICT Research and Application
No. Panggil	JICTRA V15N3 December 2021
Penerbit	ITB Journal Publisher : Bandung., 2021
Deskripsi Fisik	hlm : 205-215
Bahasa	English
ISBN/ISSN	2337-5787
Klasifikasi	JICTRA
Tipe Isi	-

Tipe Media	-
Tipe Pembawa	-
Edisi	Volume 15 Nomor 3 December 2021
Subyek	Ilmu Teknik
Info Detil Spesifik	-
Pernyataan Tanggungjawab	-

Versi lain/terkait

Tidak tersedia versi lain

Informasi

DETAIL CANTUMAN

Kembali ke sebelumnya XML Detail Cite this