Detail Cantuman
Advanced SearchArtikel Jurnal
Performance of Methods in Identifying Similar Languages Based on String to Word Vector
Indonesia has a large number of local languages that have cognate words, some of which have similarities among each other. Automatic identification within a family of languages faces problems, so it is necessary to learn the best performer of language identification methods in doing the task. This study made an effort to identification Indonesian local languages, which used String to Word Vector approach. A string vector refers to a collection of ordered words. In a string vector, a word is represented as an element or value, while the word becomes an attribute or feature in each numeric vector. Among Naïve Bayes, SMO, J48, and ZeroR classifiers, SMO is found to be the most accurate classifier with a level of accuracy at 95.7% for 10-fold cross-validation and 94.4% for 60%: 40%. The best tokenizer in this classification is Character N-Gram. All classifiers, except ZeroR shows increased accuracy when using Character N-Gram Tokenizer compared to Word Tokenizer. The best features of this system are the TriGram and FourGram Character. The TriGram is preferred because it requires smaller training data. The highest accuracy value in the combination experiment is 0.965 obtained at a combination of IDF = FALSE and WC = TRUE, regardless the conditions of the TF.
Ketersediaan
JKI4-002 | JKI V6N1 April 2020 | Perpustakaan FT UPI YAI | Tersedia |
Informasi Detil
Judul Seri |
Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika
|
---|---|
No. Panggil |
JKI V6N1 April 2020
|
Penerbit | Universitas Muhammadiyah Surakarta : Surakarta., 2020 |
Deskripsi Fisik |
hlm : 9-14
|
Bahasa |
English
|
ISBN/ISSN |
2621-038X
|
Klasifikasi |
JKI
|
Tipe Isi |
-
|
Tipe Media |
-
|
---|---|
Tipe Pembawa |
-
|
Edisi |
Volume 6 Nomor 1 April 2020
|
Subyek | |
Info Detil Spesifik |
-
|
Pernyataan Tanggungjawab |
-
|
Versi lain/terkait
Tidak tersedia versi lain