<?xml version="1.0" encoding="UTF-8" ?>
<modsCollection xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" xmlns:slims="http://slims.web.id" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods version="3.3" id="59059">
 <titleInfo>
  <title>Development of Focused Crawlers for Building Large Punjabi News Corpus</title>
 </titleInfo>
 <name type="Personal Name" authority="">
  <namePart>Mahi Gurjot Singh</namePart>
  <role>
   <roleTerm type="text">Primary Author</roleTerm>
  </role>
 </name>
 <typeOfResource manuscript="no" collection="yes">mixed material</typeOfResource>
 <genre authority="marcgt">bibliography</genre>
 <originInfo>
  <place>
   <placeTerm type="text">Bandung</placeTerm>
   <publisher>ITB Journal Publisher</publisher>
   <dateIssued>2021</dateIssued>
  </place>
 </originInfo>
 <language>
  <languageTerm type="code">e</languageTerm>
  <languageTerm type="text">English</languageTerm>
 </language>
 <physicalDescription>
  <form authority="gmd">Artikel Jurnal</form>
  <extent>hlm : 205-215</extent>
 </physicalDescription>
 <relatedItem type="series">
  <titleInfo/>
  <title>Journal of ICT Research and Application</title>
 </relatedItem>
</mods>
<note>&#13;
Abstract&#13;
&#13;
Web crawlers are as old as the Internet and are most commonly used by search engines to visit websites and index them into repositories. They are not limited to search engines but are also widely utilized to build corpora in different domains and languages. This study developed a focused set of web crawlers for three Punjabi news websites. The web crawlers were developed to extract quality text articles and add them to a local repository to be used in further research. The crawlers were implemented using the Python programming language and were utilized to construct a corpus of more than 134,000 news articles in nine different news genres. The crawler code and extracted corpora were made publicly available to the scientific community for research purposes.&#13;
</note>
<note type="statement of responsibility"></note>
<subject authority="">
 <topic>Ilmu Teknik</topic>
</subject>
<classification>JICTRA</classification>
<identifier type="isbn">23375787</identifier>
<location>
 <physicalLocation>Perpustakaan Teknik UPI YAI </physicalLocation>
 <shelfLocator>JICTRA  V15N3 December 2021</shelfLocator>
 <holdingSimple>
  <copyInformation>
   <numerationAndChronology type="1">JICTRA6-001</numerationAndChronology>
   <sublocation>Perpustakaan FT UPI YAI</sublocation>
   <shelfLocator>JICTRA  V15N3 December 2021</shelfLocator>
  </copyInformation>
 </holdingSimple>
</location>
<slims:image>JICTRA_ITB_small.png.png</slims:image>
<recordInfo>
 <recordIdentifier>59059</recordIdentifier>
 <recordCreationDate encoding="w3cdtf">2023-02-15 15:56:27</recordCreationDate>
 <recordChangeDate encoding="w3cdtf">2023-02-15 15:56:27</recordChangeDate>
 <recordOrigin>machine generated</recordOrigin>
</recordInfo>
</modsCollection>