본문으로 바로가기 주메뉴 바로가기

사용자별 맞춤메뉴

자주찾는 메뉴

추가하기
닫기

연구성과

contents area

detail content area

HIA: a genome mapper using hybrid index-based sequence alignment
  • 작성일2018-02-13
  • 최종수정일2018-02-13
  • 담당부서연구기획과
  • 연락처043-719-8033
  • 2,204
Algorithms for Molecular Biology, 2015, 01, 1─9

HIA: a genome mapper using hybrid index-based sequence alignment

Jongpill Choi, Kiejung Park, Seong Beom Cho, Myungguen Chung

Abstract

    Background
    A number of alignment tools have been developed to align sequencing reads to the human reference genome. The scale of information from next-generation sequencing (NGS) experiments, however, is increasing rapidly. Recent studies based on NGS technology have routinely produces exome or whole-genome seqeuences from several hundreds or thousands of samples. To accomodate the increasing need of analyzing very large NGS data sets, it is necessary to develop faster, more sensitive and accurate mapping tools.
    Results
    HIA uses two indices, a hash talbe index and a suffix array index. The has table performs direct lookup of a q-gram, and the suffix array performs very fast lookup of variable-length strings by exploiting binary search. We observed that combining hash table and suffix array (hybrid index) is much faster than suffix array method for finding a substring in the reference sequence. Here, we defined the matching region (MR) is a longest common substring between a reference and a read. And, we also defined the candidate alignment regions (CARs) as a list of MRs that is close to each other. The hybrid index is used to find candiate alignment regions (CARs) between a reference and a read. We found that aligning only the unmatched regions in the CAR is much faster than aligning the whole CAR. In bencemark analysis, HIA outperformed in mapping speed compared with the other aligners, without significant lost of mapping accurary.
    Conclusions
    Our experiments show that the hybrid of hash table and suffix array is useful in terms of speed for mapping NGS seqeuncing reads to the human reference genome sequence. In conclusion, our tool is appropriate for aligning massive data sets generated by NGS sequencing.


  • ISBN or ISSN: 1748-7188

  • 본 연구는 질병관리본부 연구개발과제(과제번호 2013-NG72001-00) 연구비를 지원받아 수행되었습니다.
  • This research was supported by a fund(code 2013-NG72001-00) by Research of Korea Centers for Disease Control and Prevention.


본 공공저작물은 공공누리  출처표시+상업적이용금지+변경금지 조건에 따라 이용할 수 있습니다 본 공공저작물은 공공누리 "출처표시+상업적이용금지+변경금지" 조건에 따라 이용할 수 있습니다.
TOP