User Guide

From Jillion
Jump to: navigation, search

To help you get a feel for what's possible with Jillion. This User Guide provides an overview of the major modules of Jillion as well as descriptions and code examples using key classes and interfaces in those modules.


Contents

Overview

Focused on Abstraction

Jillion code heavily uses abstraction. Most non-value objects should only be referred to by its interface or abstract class. This allows several implementations to exist and get passed around without breaking any user code.

Immutable

Most Jillion objects are immutable - they can't be modified once they have been built. This makes coding simpler since most of the code doesn't have to worry about variables changing state. Multi-threading is usually easier and more performant since there were be fewer places in the code that need to be synchronized.

Builders

Most Jillion objects are built using the Builder pattern. This lets objects be built over a series of different "steps". The constructor parameters for Jillion Builders are usually only the required parameters. There may be several additional optional configuration methods that the user may wish to also call on the Builder to set optional parameters. Once all the parameters have been set, calling the Builder.build() method will return a new instance of the type.

Builder sequenceDiagram.png

The above Sequence diagram shows the general useage of a Builder. This example creates a new XBuilder instance (a made up class) that will build an immutable instance of type X. After setting some optional parameters, the user invokes the build() method which will return a new X instance.

Some Jillion Builders may even use the configuration parameters to pick which implementation will get built (kind of combining the Builder pattern with the AbstractFactory pattern).

Core Module

The org.jcvi.jillion.core module contains common genomic objects that are useful to all genomic and bioinformatic investigations. This is the primary module of this library. All other modules are dependent on the core.

Important interfaces/ classes:

  • Range - Range is an object representing an immutable pair of coordinates which describes a contiguous subset of values. Ranges are used throughout the code base to represent everything from trim points, sub sequence ranges, alignment coordinates, read and/or contig locations in a scaffold.
  • Rangeable - is an interface that means the object can be represented as a Range.
  • Sequence - A Sequence abstracts how a list of objects is stored which allows a variety of implementations using various encoding and compression methods to store a sequence. The most common Sequence classes are for repersenting Nucleotide and Quality sequences.
  • DataStores - A DataStore is an abstraction for a repository for multiple genomic objects that can be fetched by an ID. This allows various DataStore implementations to store these objects either all in memory, store byte offsets into various files, or possibly look up values in a database or URLs to websites.
  • StreamingIterator an Iterator that is also closable. Closing the iterator allows it to clean up its resources and provides a clean way to break iteration without leaking resources.

Parsing Files

Parsing Files - If using DataStores that wrap genomic data files is not sufficient for your particular usecase, the data files may be parsed directly. Jillion can parse many different genomic file formats including binary encoded files. Each parser uses a “push approach” event notification system similar to the Visitor pattern.

Fasta Module

The Fasta Module org.jcvi.jillion.fasta contains classes for reading and writing FASTA encoded files for nucleotide, protein, quality and sanger position sequences. DataStores can be used to represent mulit-fasta files.

FastaRecord - An individual read from a fasta file is referred to as a FastaRecord which contains an ID, a sequence and an optional comment. There are FastaRecord implementations for nucleotide, protein, quality or sanger position sequences.

FastaDataStore - Object representation used to represent a multi-fasta file.

FastaWriter - The FastaWriter interface is used to write out Fasta encoded files. There are implementations for nucleotide, protein, quality or sanger position FastaRecords.

Trace Module

Jillion considers a trace to be a genomic object that has an ID as well as nucleotide and quality sequences. This makes the output of most sequencing machines "traces". The org.jcvi.jillion.trace module and its file format specific subpackages support many different trace file formats for both sanger and next-gen sequencers.

Next-Gen Trace Files Supported

Fastq

The Fastq page explains all the classes and capabilities of Jillion's fastq package. Jillion can read and write Fastq files encoded in SANGER, ILLUMINA or SOLEXA quality formats. It is also possible to convert from any of these formats into any other format.

SFF

The SFF page explains Jillion's support for reading and writing sff encoded files produced by 454 Life Sciences or Ion Torrent.


Sanger Trace Files Supported

Chromatograms

The Chromatogram page explains how Jillion can read and write chromatogram objects encoded in ztr, scf formats as well as read abi formatted chromatogram files. It is also possible to convert from any of these formats into any other writable format.

phd

The phd page explains how Jillion can read and write phd files and phd.ball files that are used by phred/prhap and consed.

Frg

The Frg page explains how Jillion's support of Frg files are used by the Celera Assembler.

Assembly Module

The org.jcvi.jillion.assembly module contains classes for working with output from genome assemblers. Jillion can handle contigs created by de-novo assemblers or by reference assemblers. It is even possible to create contig objects from "scratch".

  • Contig Objects - A Jillion Contig object is the base class that all contigs derive from. Contig objects have the consensus sequence and all the underlying read alignments and gapped sequences that provide coverage for the consensus.
  • CoverageMap - A CoverageMap is an Object that contains coverage information for a contiguous range of offset values. CoverageMaps can be created from cotngis to get the depth of coverage at each point in the contig or from any collection of objects that implement the Rangeable interface.
  • SliceMap - Get the Slice representation of a Contig. Slices can be used for variant detection and consensus recalling.
  • Consensus Recalling - Jillion supports many different consensus calling algorithms which can be used to change a contig's consensus.
  • Contig Builders - Contig objects are immutable. Use ContigBuilders to modify already existing objects or to create new contigs from "scratch".
  • Contig DataStores - DataStore implementations that wrap assembly files for many common assemblers including:

MAQ Module

The org.jcvi.jillion.maq module contains classes for working with binary encoded MAQ formats such as .bfq and .bfa files.

Sam Module

Jillion now supports reading and writing SAM and BAM files. The org.jcvi.jillion.sam module contains classes for reading and writing SAM and BAM files. Jillion can also re-sort SAM and BAM as well as read and write BAM indexes (.bai files).

  • SamHeader - class for working reading, writing and modifying information stored in a SAM or BAM header.
  • Cigar - package for working with CIGAR data.
  • SamRecord - class that represents a single line in a SAM file.
  • Parsing SAM and BAM files
    • SamVisitor - Visitor interface for visiting SAM and BAM files
    • SamParser - interface for classes that take SamVisitors and walk over a SAM and BAM files calling the appropriate visit methods.
  • SamFileDataStore - A special DataStore implementation that represents a single SAM or BAM file.
  • SamWriter - classes and interfaces for writing SAM and BAM files.
Personal tools
Namespaces

Variants
Actions
Navigation
Javadoc
Community
Toolbox