Genome alignment

From wubrowse wiki
Jump to: navigation, search

Genome alignment is pairwise alignment, in which the query genome is aligned to the target genome.

Genome alignment can be produced through many means, most notably the blastz. From the blastz output, one part of query genome is mapped to its most similar part in the target genome, producing a gapped alignment. UCSC Genome Browser maintains and distributes blastz alignment results between most of the genomes it's hosting. The results are in the form of AXT format files.

The AXT files can be converted to a genomealign track and displayed on the WashU browser with following commmand:

$ python subtleKnife/script/hammock/axt_dirfiles.py query_genome_chr_len query2target

With above example it generates two files:

query2target.gz
query2target.gz.tbi

These two files can be submitted to TARGET genome on the browser for display (but not on the query genome).

Genomealign data format

It is based on the hammock track format. The file has four columns:

  1. target chromosome name
  2. target alignment start (0-based)
  3. target alignment stop (do not include last base)
  4. JSON string

The JSON string is an object without outmost bracket. An example below (line breaks are used to increase readability):

id:223582,
genomealign:{
  chr:"chr6",
  start:121480228,
  stop:121480456,
  strand:"-",
  targetseq:"ctggagattctta-ttagtgatttgggctggggcc-tggccatgtgtattttttta-aatttccactgatgattttgctgcatggccggtgttgagaatgactgCG-CAAATTTGCCGGATTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCAC----------------------------------------------CATTTTTCTTTTCGTT",
  queryseq:"cTAGGGAGTCTTAGTCAAAGGTTTGGACCAAGTCCCTGGCCATGCAGATCTTTGTAGAATCTCCACTCGTGACTTTCCTGCATAACCAGAGTTGAGCATCTTTGAGTCAAGTGTGCCA-ACTTTCTTTGCTGTT-------------TAAATAAGGATGCCAACACCGCATGTCATTAACAGTCTCGTAGGTTGATTGATTTGTTGGCTGGCTCAAAAATGAGAGTTATTTTTCATTTTGTT"
}

The genomealign object defines the part of the query genome aligned to the piece on the target genome. The chr, start, stop are all coordinates of query genome. start is 0-based, and stop do not contain last base.

Unlike AXT format, when the strand is - (reverse), the start and stop coordinates still start from the beginning of the forward strand.

targetseq and queryseq contains the gapped alignment.

Genomealign track definition in datahub

Following is an example defining the mouse mm9 genome mapped to human hg19 genome. This is part of a datahub that can be submitted to hg19 genome on the browser.

The tracks attribute supplies a few track from the mouse mm9 genome to be displayed also in the hg19 genome, where mouse data will only show up at aligned regions.

{
type:'genomealign',
querygenome:'mm9',
url:'http://vizhub.wustl.edu/public/hg19/weaver/hg19_mm9_axt.gz',
mode:'show',
color:'#99004D',
tracks:[
    {type:'native_track',list:[
        {name:'refGene',mode:'full'},
        {name:'SINE',mode:'full'},
        {name:'rmsk_ensemble',mode:'full'},
        {name:'gc5Base',mode:'full'},
        ],
    },
    {type:'bam', 
     url:'http://vizhub.wustl.edu/hubSample/mm9/wgEncodeCaltechTfbsC2c12CebpbFCntrl50bE2p60hPcr1xAlnRep1.bam', 
     name:'Reads Alignment1', 
     mode:'full', 
    },
],
},

Prebuilt Genomealign tracks

Following tracks were prepared using axtNet files downloaded from the UCSC Genome Browser. (Last update:9/26/14)

  1. hg18 as target, queries:
    1. mm9 http://vizhub.wustl.edu/public/hg18/weaver/hg18_mm9_axt.gz
    2. mm10 no data
    3. rn4 http://vizhub.wustl.edu/public/hg18/weaver/hg18_rn4_axt.gz
    4. rn5 no data
    5. cavPor3 http://vizhub.wustl.edu/public/hg18/weaver/hg18_cavPor3_axt.gz
    6. danRer7 no data
    7. rheMac3 no data
  2. hg19 as target, queries:
    1. mm9 http://vizhub.wustl.edu/public/hg19/weaver/hg19_mm9_axt.gz
    2. mm10 http://vizhub.wustl.edu/public/hg19/weaver/hg19_mm10_axt.gz
    3. rn4 http://vizhub.wustl.edu/public/hg19/weaver/hg19_rn4_axt.gz
    4. rn5 http://vizhub.wustl.edu/public/hg19/weaver/hg19_rn5_axt.gz
    5. rheMac3 http://vizhub.wustl.edu/public/hg19/weaver/hg19_rheMac3_axt.gz
    6. cavPor3 http://vizhub.wustl.edu/public/hg19/weaver/hg19_cavPor3_axt.gz
    7. danRer7 http://vizhub.wustl.edu/public/hg19/weaver/hg19_danRer7_axt.gz
  3. hg38 as target, queries:
    1. mm9 no data
    2. mm10 http://vizhub.wustl.edu/public/hg38/weaver/hg38_mm10_axt.gz
    3. rn4 no data
    4. rn5 http://vizhub.wustl.edu/public/hg38/weaver/hg38_rn5_axt.gz
    5. rheMac3 http://vizhub.wustl.edu/public/hg38/weaver/hg38_rheMac3_axt.gz
    6. cavPor3 no data
    7. danRer7 no data
  4. mm9 as target, queries:
    1. hg18 http://vizhub.wustl.edu/public/mm9/weaver/mm9_hg18_axt.gz
    2. hg19 http://vizhub.wustl.edu/public/mm9/weaver/mm9_hg19_axt.gz
    3. hg38 no data
    4. rn4 http://vizhub.wustl.edu/public/mm9/weaver/mm9_rn4_axt.gz
    5. rn5 http://vizhub.wustl.edu/public/mm9/weaver/mm9_rn5_axt.gz
    6. rheMac3 no data
    7. cavPor3 http://vizhub.wustl.edu/public/mm9/weaver/mm9_cavPor3_axt.gz
    8. danRer7 http://vizhub.wustl.edu/public/mm9/weaver/mm9_danRer7_axt.gz
  5. mm10 as target, queries:
    1. hg19 http://vizhub.wustl.edu/public/mm10/weaver/mm10_hg19_axt.gz
    2. hg38 http://vizhub.wustl.edu/public/mm10/weaver/mm10_hg38_axt.gz
    3. rn4 no data
    4. rn5 http://vizhub.wustl.edu/public/mm10/weaver/mm10_rn5_axt.gz
    5. rheMac3 http://vizhub.wustl.edu/public/mm10/weaver/mm10_rheMac3_axt.gz
    6. cavPor3 http://vizhub.wustl.edu/public/mm10/weaver/mm10_cavPor3_axt.gz
    7. danRer7 http://vizhub.wustl.edu/public/mm10/weaver/mm10_danRer7_axt.gz
  6. rn4 as target, queries:
    1. hg18 http://vizhub.wustl.edu/public/rn4/weaver/rn4_hg18_axt.gz
    2. hg19 http://vizhub.wustl.edu/public/rn4/weaver/rn4_hg19_axt.gz
    3. hg38 no data
    4. mm9 http://vizhub.wustl.edu/public/rn4/weaver/rn4_mm9_axt.gz
    5. mm10 no data
    6. rheMac3 no data
    7. cavPor3 http://vizhub.wustl.edu/public/rn4/weaver/rn4_cavPor3_axt.gz
    8. danRer7 no data
  7. rn5 as target, queries:
    1. hg19 http://vizhub.wustl.edu/public/rn5/weaver/rn5_hg19_axt.gz
    2. hg38 no data
    3. mm9 http://vizhub.wustl.edu/public/rn5/weaver/rn5_mm9_axt.gz
    4. mm10 http://vizhub.wustl.edu/public/rn5/weaver/rn5_mm10_axt.gz
    5. rheMac3 http://vizhub.wustl.edu/public/rn5/weaver/rn5_rheMac3_axt.gz
    6. cavPor3 no data
    7. danRer7 no data
  8. rheMac3 as target, queries:
    1. hg19 http://vizhub.wustl.edu/public/rheMac3/weaver/rheMac3_hg19_axt.gz
    2. hg38 http://vizhub.wustl.edu/public/rheMac3/weaver/rheMac3_hg38_axt.gz
    3. mm9 no data
    4. mm10 http://vizhub.wustl.edu/public/rheMac3/weaver/rheMac3_mm10_axt.gz
    5. rn4 no data
    6. rn5 http://vizhub.wustl.edu/public/rheMac3/weaver/rheMac3_rn5_axt.gz
    7. cavPor3 no data
    8. danRer7 no data
  9. cavPor3 as target, queries:
    1. hg18 http://vizhub.wustl.edu/public/cavPor3/weaver/cavPor3_hg18_axt.gz
    2. hg19 no data
    3. hg38 no data
    4. mm9 http://vizhub.wustl.edu/public/cavPor3/weaver/cavPor3_mm9_axt.gz
    5. mm10 http://vizhub.wustl.edu/public/cavPor3/weaver/cavPor3_mm10_axt.gz
    6. rn4 http://vizhub.wustl.edu/public/cavPor3/weaver/cavPor3_rn4_axt.gz
    7. rn5 no data
    8. rheMac3 no data
    9. danRer7 no data
  10. danRer7 as target, queries:
    1. hg19 http://vizhub.wustl.edu/public/danRer7/weaver/danRer7_hg19_axt.gz
    2. hg38 no data
    3. mm9 http://vizhub.wustl.edu/public/danRer7/weaver/danRer7_mm9_axt.gz
    4. mm10 http://vizhub.wustl.edu/public/danRer7/weaver/danRer7_mm10_axt.gz
    5. rn4 no data
    6. rn5 no data
    7. cavPor3 no data
    8. rheMac3 no data

Using custom genome

A custom genome can be shown when it is compared to an existing genome in the browser. Currently it is not possible to show solely a custom genome, or to show the genome alignments of two custom genomes.

To define the genomealign track for custom genome against existing genome, add the "newgenome" attribute in the genomealign track to provide information on the custom genome, including chromosome names and length, default position, and sequence file URL. The name of custom genome must be the name of any existing genome:

[

{
type:'genomealign',
querygenome:'customgenome',
url:'http://vizhub.wustl.edu/public/hg19/weaver/hg19_mm9_axt.gz',
reciprocal:{
    hg19:'http://vizhub.wustl.edu/public/mm9/weaver/mm9_hg19_axt.gz',
    },
	mode:'show',
	color:'#99004D',
	tracks:[
		{
			type:'bedgraph',
			url:'http://vizhub.wustl.edu/hubSample/mm9/wgEncodeLicrHistoneBatInputMAdult24wksC57bl6StdSig.gz',
			name:'signal',
			mode:'show',
			height:50,
		},
	],
newgenome:{
	scaffoldlength:{
             chr1:197195432,
             chr2:181748087,
             chr3:159599783,
             chr4:155630120,
             chr5:152537259,
             chr6:149517037,
             chr7:152524553,
             chr8:131738871,
             chr9:124076172,
             chr10:129993255,
             chr11:121843856,
             chr12:121257530,
             chr13:120284312,
             chr14:125194864,
             chr15:103494974,
             chr16:98319150,
             chr17:95272651,
             chr18:90772031,
             chr19:61342430,
             chrX:166650296,
             chrY:15902555,
             chrM:16299
        },
	defaultposition:'chr6,51999773,chr6,52368420',
	sequencefile:'http://epgg-test.wustl.edu/installdata/mm9.gz',
}
},

]