Hammock

From wubrowse wiki
Revision as of 07:45, 10 April 2014 by Xzhou (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

"Hammock" is a new file format for encoding richly annotated genomics features. It is developed by Xin Zhou and is used throughout the WashU EpiGenome Browser.

See the hammock tracks at work http://epigenomegateway.wustl.edu/browser/?genome=hg19&datahub=http://vizhub.wustl.edu/hubSample/hg19/hub.experiment


A hammock-format file is line-oriented, tabular text file. Each line is one genomic feature, and must have 4 fields:

  1. chromosome name
  2. start coordinate (0 offset)
  3. stop coordinate
  4. JSON string


The 4th field is used to encode annotation information as an JSON string. This string is an JSON object without the outmost curly brackets, in the form of "key":"value" pairs. An example:

id:1,name:"NM_001005224",strand:"-",struct:{thick:[[621095,622034],],}

If an attribute name is a singular word (no spaces), it can go without double quotes. Otherwise all strings must be quoted.


A hammock file should be compressed and indexed so to be displayed on browser, learn more at Submit_bed_tracks#Large_bed_files.


Simple attributes

id

The value of "id" is an non-negative integer and each line in a hammock file must have an unique id value.

id:1

This is the only mandatory attribute. All the rest are optional.

name

Name of the item, value is string.

name:"WDR52"

desc

Free text describing this item. Typically a short sentence. Will be shown upon clicking this item.

desc:"Homo sapiens WD repeat domain 52 (WDR52), transcript variant 1, mRNA."

strand

+ for forward strand, and - for reverse.

strand:'+'

details

To provide itemized descriptions for this item. Value is an object of key:value pairs (all of which are free text).

details:{
    Divergence:"14%",
    Deletion:"6%",
    Insertion:"0%",
    "Repbase class":"DNA",
    "Repbase family":"hAT-Charlie"
}

This attribute is for display purpose only (shown in the tooltip bubble upon clicking on the item) and will not alter the appearance of the item.

struct

To highlight regions in the item as boxes. Predefined box size can be achieved by keywords "thin" and "thick". When the "struct" parameter is given, the item will be drawn as an array of boxes joined by a line, the typical appearance of "UTR-exon-intron" scheme for a gene model.

struct:{
    thin:[[934343,934438],[935353,935552],],
    thick:[[934438,934812],[934905,934993],[935071,935353],]
}

This example defines 6 boxes: 2 thin and 4 thick. Please make sure the coordinates are within the item's range.

sbstroke

Short for "single-base stroke". Its value is a list of basepair positions. The browser will draw a vertical line on the item to highlight each of the bases.

sbstroke:[200]

Note that the base pair position is relative to the start coordinate of the item. This is an experimental feature designed to show ENCODE narrowPeak data and the browser rendering does not allow customization.


Compound attributes

Unlike simple attributes above, attributes in this section must have corresponding auxiliary information in order to function. Usually such information is provided in a datahub.

category

Defines which category this item belongs to by giving the integer category index. This attribute can be used to separate items into groups and assign different color for each group.

category:1

Presence of "category" attribute must be accompanied by the "categories" entry in the section of the datahub that defines this track:

categories:{
    1:['SINE','#006600'],
    2:['DNA','#858585']
}

Where "SINE" is the category's name, and "#006600" is category's color.

scorelst

Provides a list of numerical values to annotate this item, such as expression value for a transcript, or signal intensity for a probe. The scores are a meaningful way to annotate items, and can be used to alter the appearance of items by applying different shades of darkness, or making a barplot on the top. User has full choices on which score should be used when viewing the track, and can opt for not using the score at all.

scorelst:[4.678,2.072300e+01,6.542344e-20]

Presence of "scorelst" attribute must be accompanied by the "scorenamelst" entry in the section of the datahub that defines this track. The value of "scorenamelst" is a list of strings giving the name of each score:

scorenamelst:["enrichment value", "P value (-log10)","Q value (-log10)"]

Each item must have same number of values in the "scorelst" array, and the number must equal to the length of "scorenamelst" array.

The attribute "showscoreidx" can be used in the datahub to define which score will be used by default when the track is displayed. If a score will be used the value should be the array index of that score (starts from 0).

Optionally the "scorescalelst" can be used along with the "scorenamelst" attribute in the datahub, to define the Y scale for each type of the scores. "type:0" is for automatically determined Y scale, and "type:1" is for scale with fixed score range. The range is determined by two other keywords "min" and "max":

scorescalelst:[
    {type:1,min:0,max:10},
    {type:0},
    {type:0},
]

Things to avoid

Do not use the start and stop attribute in the 4th column of the hammock file, it will confuse the program.

Submitting hammock track to browser

The first three methods require that the hammock track to be hosted on a web server. To do this, follow the same procedures as those for bedGraph tracks: http://washugb.blogspot.com/2012/09/generate-tabix-files-from-bigwig-files.html


From submission panel

Through the submission panel, you can submit one hammock track at a time.

On the top-right of the browser panel, click buttons "Tracks" > "Custom tracks" > "+ Add new" to open the custom track submission panel. Click "Hammock" button to show the interface. Enter file URL and name to submit.

JSON description can be entered but not required. If omitted, the hammock track can still be submitted, but will have some features disabled.

From URL parameter

This is a restrictive method that does not allow submitting JSON descriptions.

Use the hammock parameter to submit one or more hammock tracks to be displayed in the browser. Example:

http://epigenomegateway.wustl.edu/browser/?genome=hg19&hammock=my+hammock+track,http://vizhub.wustl.edu/hubSample/hg19/refGene.gz,full

As the value of the "hammock" parameter, use 3 fields to describe each track (join fields by comma):

  • name
  • URL of hammock file
  • mode (full/thin/density)

From datahub

Following content defines a hammock track in the datahub. Identical contents can also be used in embedded browsers.

To indicate the type of the track, use "hammock" as the value of the "type" keyword, "annotation" can also be used for backward compatibility.

{
    type:'hammock',
    url:'http://vizhub.wustl.edu/hubSample/hg19/SINE-DNA.gz',
    mode:'barplot',
    name:'SINE and DNA transposons',
    categories:{
        1:['SINE','#006600'],
        2:['DNA','#a300a3']
    },
    showscoreidx:0,
    scorenamelst:[
        "Smith-Waterman score",
        "SW score normalized by length"
    ],
    scorescalelst:[
        {type:0},
        {type:0}
    ],
}

Via uploading text file

This method allows uploading a hammock track in the form of a text file. No file hosting by web servers are required.

  • Go to "Apps" > "File upload".
  • Select file, click the "Setup" button to show the setup menu.
  • Choose "Hammock" format. Optionally enter JSON description for this file.
  • Click "Add as track" button. The file processing will start.

Barplot

"Barplot" is a special display mode for hammock tracks which have numerical values as those defined using scorelst attributes in the track items.

Applications

Genes

Gene expression

LD (linkage disequilibrium track)

narrowPeak

narrowPeak format specification: http://genome.ucsc.edu/FAQ/FAQformat.html#format12

To display narrowPeak file on the browser, convert it into hammock format with this script in the wubrowse source code archive: /script/hammock/narrowpeak.py

$ python /script/hammock/narrowpeak.py input.narrowPeak output

Above commandline will generate "output.gz" and "output.gz.tbi".

The JSON string from a converted narrowPeak track may look like:

scorelst:[11.520,5.476800e+01,1.599931e-53],id:1113,sbstroke:[327]

To define this track in a datahub, use following:

{type:'hammock',
url:'http://vizhub.wustl.edu/hubSample/hg19/wgEncodeSydhTfbsHelas3Stat1Ifng30StdPk.gz',
mode:'barplot',
name:'narrowPeak example',
showscoreidx:0,
scorenamelst:["signal value", "P value (-log10)","Q value (-log10)"],
boxcolor:'#210085',
strokecolor:'#ff6600',
},

broadPeak

broadPeak format: http://genome.ucsc.edu/FAQ/FAQformat.html#format13

This format is simpler than narrowPeak by missing the "summit" while having everything else the same. Convert broadPeak format to hammock format using this script: /script/hammock/broadpeak.py.

JSON string from a broadPeak file may look like:

scorelst:[10.108185,6.0],id:27,

To define this track in datahub, the content may be:

{type:'hammock',
url:'http://vizhub.wustl.edu/hubSample/hg19/wgEncodeBroadHistoneA549CtcfEtoh02Pk.gz',
mode:'barplot',
name:'broadPeak example',
showscoreidx:0,
scorenamelst:["signal value", "P value (-log10)"],
boxcolor:'#850063',
},

gappedPeak

gappedPeak format: http://genome.ucsc.edu/FAQ/FAQformat.html#format14

To convert gappedPeak format to hammock format, use: /script/hammock/gappedpeak.py

JSON text from a gappedPeak file may look like:

scorelst:[2.18799,2.34156,1.04649],id:25084,struct:{thin:[[960692,966631]],thick:[[961483,961652],[962485,963271],[966386,966625],]},name:"Rank_25084",

Note the value of struct.thin has only one segment spanning the entire peak region.


To define this track in a datahub, following may be used:

{type:'hammock',
url:'http://vizhub.wustl.edu/hubSample/hg19/E015-H3K36me3.gz',
mode:'barplot',
name:'gappedPeak example',
showscoreidx:0,
scorenamelst:["signal value", "P value (-log10)","Q value (-log10)"],
boxcolor:'#cc6600',
},