Add new genome

From wubrowse wiki
(Redirected from Add a private genome)
Jump to: navigation, search
  • If you found your interested genome is not listed on our browser, you could try to add them to your own local mirror using following instructions. Here we use mouse mm10 and soybean Gmax_189 as example, whose annotation files are in BED and GFF format respectively.
    • If you want to parse annotation files from UCSC, which we assume the annotation files are in BED format, please see the example 'Add mouse mm10' below.
    • If your annotation files are in GFF format, please see example 'Add soybean Gmax_189'
  • Please note, some commands below require installation of Kent source tree from UCSC, please consult UCSC documentation for help.
  • You should know how to run commands under Linux and how to operate MySQL database.
  • Please note this current instruction works for WashU EpiGenome Browser v39.

Scripts

Python scripts come along with browser source code archive: https://github.com/epgg/script .

Alternatively you can download them from http://egg.wustl.edu/script

Add mouse mm10

Prepare genome sequence file

  • create several folders contains files for mm10
mkdir /srv/epgg/data/data/subtleKnife/mm10
mkdir /srv/epgg/data/data/subtleKnife/mm10/session
  • download sequence from UCSC
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.2bit
  • generate GC percent track, optional
hgGcPercent -win=5 -file=gc5Base.wig -wigOut mm10 mm10.2bit -noDots
wigToBigWig gc5Base.wig mm10.size gc5Base.bigWig
twoBitInfo mm10.2bit mm10.size
  • generate sequence tracks
twoBitToFa mm10.2bit mm10.fa
python /srv/epgg/data/subtleKnife/script/fa2tabix.py mm10.fa mm10

Prepare annotation tracks

  • download following annotation files from UCSC and decompressed them
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/knownGene.txt.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/ensGene.txt.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refLink.txt.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/ensemblToGeneName.txt.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/kgXref.txt.gz
gunzip ensGene.txt.gz refGene.txt.gz knownGene.txt.gz refLink.txt.gz ensemblToGeneName.txt.gz kgXref.txt.gz
  • generated files for database import
python mm10_genescript/ensGene.py ensGene.txt ensemblToGeneName.txt kgXref.txt
python mm10_genescript/refGene.py  refGene.txt kgXref.txt
python mm10_genescript/knownGene.py knownGene.txt kgXref.txt
# collect the output to a file for later use
  • prepare repeat tracks
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz
gunzip rmsk.txt.gz
python /srv/epgg/data/subtleKnife/script/rmsk.py rmsk.txt
# rmsk.txt need loaded to database for see details of each repeat element
  • prepare chromosome bands
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/cytoBandIdeo.txt.gz
gunzip cytoBandIdeo.txt.gz
python /srv/epgg/data/subtleKnife/script/cytoband.py cytoBandIdeo.txt > cytoband
  • generate CpG island track
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/cpgIslandExt.txt.gz
python ~/pyScript/cpgIsland.py
  • prepare conservation tracks
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons.bw
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons60wayEuarchontoGlire.bw
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons60wayGlire.bw
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons60wayPlacental.bw
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phyloP60way/mm10.60way.phyloP60way.bw
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phyloP60way/mm10.60way.phyloP60wayEuarchontoglire.bw
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phyloP60way/mm10.60way.phyloP60wayGlire.bw
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phyloP60way/mm10.60way.phyloP60wayPlacental.bw
  • rename the files downloaded above, remove the characters 'mm10.60way.'
  • get scaffold information
python ~/pyScript/scaffoldInfo.py mm10.size scaffoldInfo
#total base:  2730871774
#chromosome:,  22
#contig:,  44
  • annotation files like *.gz* and *.bigWig should be located in /srv/epgg/data/data/subtleKnife/mm10
  • sequence files mm10.gz* should be located in /srv/epgg/data/data/subtleKnife/seq
  • also you should have following files after go through the steps above
ensGene.struct.txt  ensGenesymbol  knownGene.struct.txt  knownGenesymbol refGene.struct.txt  refGenesymbol
  • create a file named loadGene.sql with following contents
drop table if exists ensGenestruct;
create table ensGenestruct (
id int unsigned not null primary key,
chrom varchar(255) not null,
strand char(1) not null,
txStart int unsigned not null,
txEnd int unsigned not null,
cdsStart int unsigned not null,
cdsEnd int unsigned not null,
exonCount int unsigned not null,
exonStarts text not null,
exonEnds text not null,
name varchar(255) not null
);
load data local infile 'ensGene.struct.txt' into table ensGenestruct;


drop table if exists ensGenesymbol;
create table ensGenesymbol (
name varchar(255) not null,
symbol varchar(255) null,
description text null,
id int unsigned not null primary key,
index(name)
);
load data local infile 'ensGenesymbol' into table ensGenesymbol;

drop table if exists refGenestruct;
create table refGenestruct (
id int unsigned not null primary key,
chrom varchar(255) not null,
strand char(1) not null,
txStart int unsigned not null,
txEnd int unsigned not null,
cdsStart int unsigned not null,
cdsEnd int unsigned not null,
exonCount int unsigned not null,
exonStarts text not null,
exonEnds text not null,
name varchar(255) not null
);
load data local infile 'refGene.struct.txt' into table refGenestruct;

drop table if exists refGenesymbol;
create table refGenesymbol (
name varchar(255) not null,
symbol varchar(255) null,
description text null,
id int unsigned not null primary key,
index(name)
);
load data local infile 'refGenesymbol' into table refGenesymbol;


drop table if exists knownGenestruct;
create table knownGenestruct (
id int unsigned not null primary key,
chrom varchar(255) not null,
strand char(1) not null,
txStart int unsigned not null,
txEnd int unsigned not null,
cdsStart int unsigned not null,
cdsEnd int unsigned not null,
exonCount int unsigned not null,
exonStarts text not null,
exonEnds text not null,
name varchar(255) not null
);
load data local infile 'knownGene.struct.txt' into table knownGenestruct;


drop table if exists knownGenesymbol;
create table knownGenesymbol (
name varchar(255) not null,
symbol varchar(255) null,
description text null,
id int unsigned not null primary key,
index(name)
);
load data local infile 'knownGenesymbol' into table knownGenesymbol;

example track annotation file for v40

example track annotation file for v39 and before

  • decorInfo
ensGene Ensembl genes   \N      2       24      0       http://www.ensembl.org/Mus_musculus/geneview?gene=
xenoRefGene     non-mouse RefSeq genes  \N      2       24      0       http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&doptcmdl
refGene RefSeq genes    \N      2       24      0       http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&doptcmdl=GenBank&term=
rmsk_ensemble   RepeatMasker Ensemble   \N      4       12      0       \N
phastCons       PhastCons       \N      6       14      0       \N
phastCons60wayEuarchontoGlire   euarchontoglires PhastCons      \N      6       14      0       \N
phastCons60wayGlire     glires PhastCons        \N      6       14      0       \N
phastCons60wayPlacental placental PhastCons     \N      6       14      0       \N
phyloP60wayEuarchontoglire      euarchontoglires PhyloP \N      6       14      0       \N
phyloP60wayGlire        glires PhyloP   \N      6       14      0       \N
phyloP60wayPlacental    placental PhyloP        \N      6       14      0       \N
phyloP60way     PhyloP  \N      6       14      0       \N
cpgIsland       CpG Island      \N      5       0       0       \N
gc5Base GC percent      \N      5       14      0       \N
  • decorInfo_rmsk
srpRNA  srpRNA  \N      4       24      0       \N
LTR     LTR     \N      4       24      0       \N
Satellite       Satellite       \N      4       24      0       \N
scRNA   scRNA   \N      4       24      0       \N
DNA     DNA     \N      4       24      0       \N
Simple_repeat   Simple_repeat   \N      4       24      0       \N
SINE?   SINE?   \N      4       24      0       \N
Unknown Unknown \N      4       24      0       \N
RNA     RNA     \N      4       24      0       \N
RC      RC      \N      4       24      0       \N
DNA?    DNA?    \N      4       24      0       \N
RC?     RC?     \N      4       24      0       \N
snRNA   snRNA   \N      4       24      0       \N
Other   Other   \N      4       24      0       \N
rRNA    rRNA    \N      4       24      0       \N
LINE?   LINE?   \N      4       24      0       \N
tRNA    tRNA    \N      4       24      0       \N
LINE    LINE    \N      4       24      0       \N
SINE    SINE    \N      4       24      0       \N
LTR?    LTR?    \N      4       24      0       \N
Low_complexity  Low_complexity  \N      4       24      0       \N
rRNArRNA        rRNA (rRNA)     rRNA    4       24      0       \N
LINEL1? L1? (LINE)      LINE    4       24      0       \N
DNAhAT-Tip100   hAT-Tip100 (DNA)        DNA     4       24      0       \N
LTRLTR  LTR (LTR)       LTR     4       24      0       \N
UnknownUnknown  Unknown (Unknown)       Unknown 4       24      0       \N
LTRERVL ERVL (LTR)      LTR     4       24      0       \N
LTRERVK ERVK (LTR)      LTR     4       24      0       \N
DNAhAT-Tip100?  hAT-Tip100? (DNA)       DNA     4       24      0       \N
OtherOther      Other (Other)   Other   4       24      0       \N
DNAhAT-Charlie  hAT-Charlie (DNA)       DNA     4       24      0       \N
DNATcMar        TcMar (DNA)     DNA     4       24      0       \N
SINEMIR MIR (SINE)      SINE    4       24      0       \N
RC?Helitron?    Helitron? (RC?) RC?     4       24      0       \N
LINERTE-BovB    RTE-BovB (LINE) LINE    4       24      0       \N
SatelliteSatellite      Satellite (Satellite)   Satellite       4       24      0       \N
DNAhAT? hAT? (DNA)      DNA     4       24      0       \N
SINE?SINE?      SINE? (SINE?)   SINE?   4       24      0       \N
DNATcMar-Pogo   TcMar-Pogo (DNA)        DNA     4       24      0       \N
scRNAscRNA      scRNA (scRNA)   scRNA   4       24      0       \N
DNA?DNA?        DNA? (DNA?)     DNA?    4       24      0       \N
SINEID  ID (SINE)       SINE    4       24      0       \N
Satellitecentr  centr (Satellite)       Satellite       4       24      0       \N
DNAMuDR MuDR (DNA)      DNA     4       24      0       \N
LTRERVL-MaLR    ERVL-MaLR (LTR) LTR     4       24      0       \N
DNADNA  DNA (DNA)       DNA     4       24      0       \N
snRNAsnRNA      snRNA (snRNA)   snRNA   4       24      0       \N
LTRERVK?        ERVK? (LTR)     LTR     4       24      0       \N
UnknownY-chromosome     Y-chromosome (Unknown)  Unknown 4       24      0       \N
SINEDeu Deu (SINE)      SINE    4       24      0       \N
DNATcMar-Tc2    TcMar-Tc2 (DNA) DNA     4       24      0       \N
LINERTE-X       RTE-X (LINE)    LINE    4       24      0       \N
LINEL2  L2 (LINE)       LINE    4       24      0       \N
DNAPiggyBac     PiggyBac (DNA)  DNA     4       24      0       \N
LINEL1  L1 (LINE)       LINE    4       24      0       \N
DNAPiggyBac?    PiggyBac? (DNA) DNA     4       24      0       \N
LINEDong-R4     Dong-R4 (LINE)  LINE    4       24      0       \N
LTRERV1 ERV1 (LTR)      LTR     4       24      0       \N
LINE?Penelope?  Penelope? (LINE?)       LINE?   4       24      0       \N
LTRGypsy?       Gypsy? (LTR)    LTR     4       24      0       \N
srpRNAsrpRNA    srpRNA (srpRNA) srpRNA  4       24      0       \N
DNAhAT-Blackjack        hAT-Blackjack (DNA)     DNA     4       24      0       \N
LTRERVL?        ERVL? (LTR)     LTR     4       24      0       \N
LINECR1 CR1 (LINE)      LINE    4       24      0       \N
RCHelitron      Helitron (RC)   RC      4       24      0       \N
Simple_repeatSimple_repeat      Simple_repeat (Simple_repeat)   Simple_repeat   4       24      0       \N
DNAMULE-MuDR    MULE-MuDR (DNA) DNA     4       24      0       \N
DNATcMar-Tigger TcMar-Tigger (DNA)      DNA     4       24      0       \N
tRNAtRNA        tRNA (tRNA)     tRNA    4       24      0       \N
RNARNA  RNA (RNA)       RNA     4       24      0       \N
LTRERV1?        ERV1? (LTR)     LTR     4       24      0       \N
SINEB4  B4 (SINE)       SINE    4       24      0       \N
LTRGypsy        Gypsy (LTR)     LTR     4       24      0       \N
DNATcMar-Mariner        TcMar-Mariner (DNA)     DNA     4       24      0       \N
SINEB2  B2 (SINE)       SINE    4       24      0       \N
DNATcMar?       TcMar? (DNA)    DNA     4       24      0       \N
SINEAlu Alu (SINE)      SINE    4       24      0       \N
DNAhAT  hAT (DNA)       DNA     4       24      0       \N
LTR?LTR?        LTR? (LTR?)     LTR?    4       24      0       \N
Low_complexityLow_complexity    Low_complexity (Low_complexity) Low_complexity  4       24      0       \N
  • track2Detail_decor
refGene download date=Nov. 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz; note=this is a copy of RefSeq Genes track of UCSC Genome Browser
xenoRefGene     download date=Nov. 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz; note=this is a copy of non-mouse RefSeq Genes track of UCSC Genome Browser
rmsk_ensemble   Download date=Jul-31-2013; Parsed by=in house script; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz
phastCons       Download date=Jul-31-2013; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons.bw
phastCons60wayEuarchontoGlire   Download date=Jul-31-2013; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons60wayEuarchontoGlire.bw
phastCons60wayGlire     Download date=Jul-31-2013; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons60wayGlire.bw
phastCons60wayPlacental Download date=Jul-31-2013; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons60wayPlacental.bw
phyloP60wayEuarchontoglire      Download date=Jul-31-2013; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phyloP60way/mm10.60way.phyloP60wayEuarchontoglire.bw
phyloP60wayGlire        Download date=Jul-31-2013; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phyloP60way/mm10.60way.phyloP60wayGlire.bw
phyloP60wayPlacental    Download date=Jul-31-2013; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phyloP60way/mm10.60way.phyloP60wayPlacental.bw
phyloP60way     Download date=Jul-31-2013; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phyloP60way/mm10.60way.phyloP60way.bw
cpgIsland       Download date=Jul-31-2013; Parsed by=in house script; Download from=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/cpgIslandExt.txt.gz
gc5Base Download date=Jul-31-2013; Parsed by=hgGcPercent in Kent utils
ensGene download_date=Feb. 19, 2014; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/; note=this is a copy of Ensembl gene track of UCSC Genome Browser
  • track2Detail_rmsk
srpRNA  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTR     download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
Satellite       download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
scRNA   download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNA     download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
Simple_repeat   download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINE?   download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
Unknown download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
RNA     download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
RC      download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNA?    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
RC?     download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
snRNA   download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
Other   download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
rRNA    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINE?   download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
tRNA    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINE    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINE    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTR?    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
Low_complexity  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
rRNArRNA        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINEL1? download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAhAT-Tip100   download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRLTR  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
UnknownUnknown  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRERVL download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRERVK download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAhAT-Tip100?  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
OtherOther      download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAhAT-Charlie  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNATcMar        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINEMIR download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
RC?Helitron?    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINERTE-BovB    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SatelliteSatellite      download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAhAT? download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINE?SINE?      download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNATcMar-Pogo   download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
scRNAscRNA      download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNA?DNA?        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINEID  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
Satellitecentr  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAMuDR download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRERVL-MaLR    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNADNA  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
snRNAsnRNA      download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRERVK?        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
UnknownY-chromosome     download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINEDeu download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNATcMar-Tc2    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINERTE-X       download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINEL2  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAPiggyBac     download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINEL1  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAPiggyBac?    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINEDong-R4     download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRERV1 download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINE?Penelope?  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRGypsy?       download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
srpRNAsrpRNA    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAhAT-Blackjack        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRERVL?        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LINECR1 download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
RCHelitron      download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
Simple_repeatSimple_repeat      download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAMULE-MuDR    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNATcMar-Tigger download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
tRNAtRNA        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
RNARNA  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRERV1?        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINEB4  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTRGypsy        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNATcMar-Mariner        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINEB2  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNATcMar?       download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
SINEAlu download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
DNAhAT  download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
LTR?LTR?        download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
Low_complexityLow_complexity    download_date=Oct 28, 2013; source=http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/
  • track2Style
cpgIsland       'textcolor':'rgb(0,0,0)','fontsize':'8pt','fontfamily':'sans-serif','fontbold':false,'bedcolor':'rgb(0,77,153)'
DNA     'textcolor':'rgb(0,0,0)','fontsize':'8pt','fontfamily':'sans-serif','fontbold':false,'bedcolor':'rgb(0,77,153)','isrmsk':true
refGene 'textcolor':'rgb(0,0,0)','fontsize':'8pt','fontfamily':'sans-serif','fontbold':false,'bedcolor':'rgb(0,77,153)',dbsearch:true,isgene:true
xenoRefGene     'textcolor':'rgb(0,0,0)','fontsize':'8pt','fontfamily':'sans-serif','fontbold':false,'bedcolor':'rgb(0,77,77)',dbsearch:true,isgene:true
ensGene 'textcolor':'rgb(0,0,0)','fontsize':'8pt','fontfamily':'sans-serif','fontbold':false,'bedcolor':'rgb(143,71,36)',dbsearch:true,isgene:true
  • track2Style_rmsk
srpRNA  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTR     showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
Satellite       showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
scRNA   showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNA     showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
Simple_repeat   showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINE?   showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
Unknown showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
RNA     showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
RC      showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNA?    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
RC?     showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
snRNA   showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
Other   showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
rRNA    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINE?   showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
tRNA    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINE    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINE    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTR?    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
Low_complexity  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
rRNArRNA        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINEL1? showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAhAT-Tip100   showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRLTR  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
UnknownUnknown  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRERVL showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRERVK showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAhAT-Tip100?  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
OtherOther      showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAhAT-Charlie  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNATcMar        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINEMIR showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
RC?Helitron?    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINERTE-BovB    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SatelliteSatellite      showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAhAT? showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINE?SINE?      showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNATcMar-Pogo   showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
scRNAscRNA      showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNA?DNA?        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINEID  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
Satellitecentr  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAMuDR showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRERVL-MaLR    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNADNA  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
snRNAsnRNA      showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRERVK?        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
UnknownY-chromosome     showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINEDeu showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNATcMar-Tc2    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINERTE-X       showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINEL2  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAPiggyBac     showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINEL1  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAPiggyBac?    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINEDong-R4     showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRERV1 showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINE?Penelope?  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRGypsy?       showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
srpRNAsrpRNA    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAhAT-Blackjack        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRERVL?        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LINECR1 showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
RCHelitron      showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
Simple_repeatSimple_repeat      showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAMULE-MuDR    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNATcMar-Tigger showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
tRNAtRNA        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
RNARNA  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRERV1?        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINEB4  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTRGypsy        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNATcMar-Mariner        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINEB2  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNATcMar?       showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
SINEAlu showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
DNAhAT  showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
LTR?LTR?        showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
Low_complexityLow_complexity    showscoreidx:1,scorenamelst:["Smith-Waterman score","SW score normalized by length"]
  • scaffoldInfo
ROOT    chromosome      0
ROOT    other   0
chromosome      chr1    195471971
chromosome      chr10   130694993
chromosome      chr11   122082543
chromosome      chr12   120129022
chromosome      chr13   120421639
chromosome      chr14   124902244
chromosome      chr15   104043685
chromosome      chr16   98207768
chromosome      chr17   94987271
chromosome      chr18   90702639
chromosome      chr19   61431566
chromosome      chr2    182113224
chromosome      chr3    160039680
chromosome      chr4    156508116
chromosome      chr5    151834684
chromosome      chr6    149736546
chromosome      chr7    145441459
chromosome      chr8    129401213
chromosome      chr9    124595110
chromosome      chrM    16299
chromosome      chrX    171031299
chromosome      chrY    91744698
other   chr1_GL456210_random    169725
other   chr1_GL456211_random    241735
other   chr1_GL456212_random    153618
other   chr1_GL456213_random    39340
other   chr1_GL456221_random    206961
other   chr4_GL456216_random    66673
other   chr4_JH584292_random    14945
other   chr4_GL456350_random    227966
other   chr4_JH584293_random    207968
other   chr4_JH584294_random    191905
other   chr4_JH584295_random    1976
other   chr5_JH584296_random    199368
other   chr5_JH584297_random    205776
other   chr5_JH584298_random    184189
other   chr5_GL456354_random    195993
other   chr5_JH584299_random    953012
other   chr7_GL456219_random    175968
other   chrX_GL456233_random    336933
other   chrY_JH584300_random    182347
other   chrY_JH584301_random    259875
other   chrY_JH584302_random    155838
other   chrY_JH584303_random    158099
other   chrUn_GL456239  40056
other   chrUn_GL456367  42057
other   chrUn_GL456378  31602
other   chrUn_GL456381  25871
other   chrUn_GL456382  23158
other   chrUn_GL456383  38659
other   chrUn_GL456385  35240
other   chrUn_GL456390  24668
other   chrUn_GL456392  23629
other   chrUn_GL456393  55711
other   chrUn_GL456394  24323
other   chrUn_GL456359  22974
other   chrUn_GL456360  31704
other   chrUn_GL456396  21240
other   chrUn_GL456372  28664
other   chrUn_GL456387  24685
other   chrUn_GL456389  28772
other   chrUn_GL456370  26764
other   chrUn_GL456379  72385
other   chrUn_GL456366  47073
other   chrUn_GL456368  20208
other   chrUn_JH584304  114452

Create database

  • create mm10 database and load the annotation files
mysql> create database mm10;
cat loadGene.sql | mysql -u hguser -p mm10 --local-infile=1
  • Note, use your own database username and password.
  • create a file named makeDb.sql with following content and load it
-- -------------------
--                  --
--  mm10
--                  --
-- -------------------
drop table if exists config;
create table config (
  bbiPath text not null,
  seqPath text null,
  defaultGenelist text not null,
  defaultCustomtracks text not null,
  defaultPosition varchar(255) not null,
  defaultDecor text null,
  defaultScaffold text not null,
  hasGene boolean not null,
  allowJuxtaposition boolean not null,
  keggSpeciesCode varchar(255) null,
  information text not null,
  runmode tinyint not null,
  initmatplot boolean not null
);
insert into config values(
"/srv/epgg/data/data/subtleKnife/mm10/",
"/srv/epgg/data/data/subtleKnife/seq/mm10.gz",
"CYP4Z1\\nCYP2A7\\nCYP2A6\\nCYP3A4\\nCYP1A1\\nCYP4V2\\nCYP51A1\\nCYP2C19\\nCYP26B1\\nCYP11B2\\nCYP24A1\\nCYP4B1\\nCYP2C8",
"{}",
"chr6,52003572,chr6,52426257",
"refGene,rmsk_all",
"chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chrX,chrY,chrM",
true,
true,
\N,
"Assembly version|mm10|Sequence source|<a href=http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/ target=_blank>UCSC browser</a>|Date parsed|July 31, 2013|Chromosomes|22|Misc|44|Total bases|2,730,871,774|Lo
0,
false
);


drop table if exists gfGrouping;
create table gfGrouping (
  id TINYINT not null primary key,
  name char(50) not null
);
insert into gfGrouping values (2, "Genes");
insert into gfGrouping values (4, "RepeatMasker");
insert into gfGrouping values (6, "Sequence conservation");
insert into gfGrouping values (5, "Others");
drop table if exists decorInfo;
create table decorInfo (
  name char(50) not null,
  printname char(100) not null,
  parent char(50) null,
  grp tinyint not null,
  fileType tinyint not null,
  hasStruct tinyint null,
  queryUrl varchar(255) null
);
load data local infile 'decorInfo' into table decorInfo;
load data local infile 'decorInfo_rmsk' into table decorInfo;

drop table if exists track2Detail;
create table track2Detail (
  name varchar(255) not null primary key,
  detail text null
);
load data local infile 'track2Detail_rmsk' into table track2Detail;
load data local infile 'track2Detail_decor' into table track2Detail;

drop table if exists track2Categorical;
create table track2Categorical (
  name varchar(255) not null primary key,
  info text not null
);
load data local infile 'track2Categorical' into table track2Categorical;


drop table if exists track2Style;
create table track2Style (
  name varchar(255) not null primary key,
  style text not null
);
load data local infile "track2Style" into table track2Style;
load data local infile "track2Style_rmsk" into table track2Style;

drop table if exists scaffoldInfo;
create table scaffoldInfo (
  parent varchar(255) not null,
  child varchar(255) not null,
  childLength int unsigned not null
);
load data local infile "scaffoldInfo" into table scaffoldInfo;

drop table if exists cytoband;
create table cytoband (
  id int null auto_increment primary key,
  chrom char(20) not null,
  start int not null,
  stop int not null,
  name char(20) not null,
  colorIdx int not null
);
load data local infile "cytoband" into table cytoband;
cat makeDb.sql | mysql -u hguser -p mm10 --local-infile=1
  • add mm10 to our genome list by editing file
/srv/epgg/data/data/subtleKnife/treeoflife

Add soybean Gmax_189

  • Create several folders for storing files for Gmax_189
mkdir /srv/epgg/data/data/subtleKnife/Gmax_189
mkdir /srv/epgg/data/data/subtleKnife/Gmax_189/session

Prepare genome sequence and annotation files

  • download following files and decompress them
wget ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/assembly/Gmax_189.fa.gz
wget ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz
wget ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_annotation_info.txt.gz
gunzip Gmax_189.fa.gz
gunzip Gmax_189_gene_exons.gff3.gz
gunzip Gmax_189_annotation_info.txt.gz
  • prepare GC percent track
faToTwoBit Gmax_189.fa Gmax_189.2bit
twoBitInfo Gmax_189.2bit Gmax_189.size
hgGcPercent -win=5 -file=gc5Base.wig -wigOut Gmax_189 Gmax_189.2bit -noDots
wigToBigWig gc5Base.wig Gmax_189.size gc5Base.bigWig
python ~/fa2tabix.py Gmax_189.fa Gmax_189
python ~/pyScript/scaffoldInfo.py Gmax_189.size scaffoldInfo
#total base:  973344380
#chromosome:,  20
#contig:,  1148
  • generate gene annotation files
python ~/pyScript/gff3_to_feature_epgg.py Gmax_189_gene_exons.gff3 
python ~/pyScript/gene_ann_info_to_genesymbol_epgg.py Gmax_189_annotation_info.txt
python ~/pyScript/gff3_to_gene_struct_epgg.py Gmax_189_gene_exons.gff3
  • annotation files like *.gz* and *.bigWig should be located in /srv/epgg/data/data/subtleKnife/Gmax_189
  • sequence files Gmax_189.gz* should be located in /srv/epgg/data/data/subtleKnife/seq
  • also you should have following files after go through the steps above
geneSymbol  geneStruct
  • create a file named loadGene.sql with following contents
drop table if exists geneSymbol;
create table  geneSymbol (
name varchar(255) not null,
symbol varchar(255) null,
description text null,
id int unsigned not null primary key,
index(name)
);
load data local infile 'geneSymbol' into table geneSymbol;

drop table if exists geneStruct;
create table geneStruct (
id int unsigned not null primary key,
chrom varchar(255) not null,
strand char(1) not null,
txStart int unsigned not null,
txEnd int unsigned not null,
cdsStart int unsigned not null,
cdsEnd int unsigned not null,
exonCount int unsigned not null,
exonStarts text not null,
exonEnds text not null,
name varchar(255) not null
);
load data local infile 'geneStruct' into table geneStruct;

example track annotation file for v39 and before

  • track2Detail_decor
gene    Download date=Aug-21-2013; Parsed by=in house script; Download from=ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz
mRNA    Download date=Aug-21-2013; Parsed by=in house script; Download from=ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz
three_prime_UTR Download date=Aug-21-2013; Parsed by=in house script; Download from=ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz
five_prime_UTR  Download date=Aug-21-2013; Parsed by=in house script; Download from=ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz
exon    Download date=Aug-21-2013; Parsed by=in house script; Download from=ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz
CDS     Download date=Aug-21-2013; Parsed by=in house script; Download from=ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz
gc5Base Download date=Aug-21-2013; Parsed by=hgGcPercent in Kent utils
  • track2Style
gene    'textcolor':'rgb(0,0,0)','fontsize':'8pt','fontfamily':'sans-serif','fontbold':false,'bedcolor':'rgb(0,77,153)',dbsearch:true,isgene:true
  • decorInfo
gene    phytozome genes \N      2       24      0       http://www.phytozome.net/cgi-bin/gbrowse/soybean/?q=
gc5Base GC percent      \N      5       14      1       \N
  • scaffoldInfo
ROOT    chromosome      0
ROOT    other   0
chromosome      Gm01    55915595
chromosome      Gm02    51656713
chromosome      Gm03    47781076
chromosome      Gm04    49243852
chromosome      Gm05    41936504
chromosome      Gm06    50722821
chromosome      Gm07    44683157
chromosome      Gm08    46995532
chromosome      Gm09    46843750
chromosome      Gm10    50969635
chromosome      Gm11    39172790
chromosome      Gm12    40113140
chromosome      Gm13    44408971
chromosome      Gm14    49711204
chromosome      Gm15    50939160
chromosome      Gm16    37397385
chromosome      Gm17    41906774
chromosome      Gm18    62308140
chromosome      Gm19    50589441
chromosome      Gm20    46773167
other   scaffold_21     1127293
other   scaffold_22     1088050
other   scaffold_23     939397
other   scaffold_24     634454
other   scaffold_25     776127
other   scaffold_27     394689
other   scaffold_28     378823
other   scaffold_30     457611
other   scaffold_31     270213
other   scaffold_32     418352
other   scaffold_33     373987
other   scaffold_35     267900
other   scaffold_36     280716
other   scaffold_37     287793
other   scaffold_38     284391
other   scaffold_39     258219
other   scaffold_40     256783
other   scaffold_41     177711
other   scaffold_42     306045
other   scaffold_43     210748
other   scaffold_44     263683
other   scaffold_45     284739
other   scaffold_46     267773
other   scaffold_47     232203
other   scaffold_48     139886
other   scaffold_50     160106
other   scaffold_52     188993
other   scaffold_53     124315
other   scaffold_54     143480
other   scaffold_56     122046
other   scaffold_57     117597
other   scaffold_58     117646
other   scaffold_60     109023
other   scaffold_62     165295
other   scaffold_63     175160
other   scaffold_64     182431
other   scaffold_65     132497
other   scaffold_66     170827
other   scaffold_67     95072
other   scaffold_68     96141
other   scaffold_69     114017
other   scaffold_70     96538
other   scaffold_71     164169
other   scaffold_72     152036
other   scaffold_73     149568
other   scaffold_74     142966
other   scaffold_76     149147
other   scaffold_77     168119
other   scaffold_79     168015
other   scaffold_80     105630
other   scaffold_81     76792
other   scaffold_84     69299
other   scaffold_86     104763
other   scaffold_87     94555
....other lines...

Create database

  • create Gmax_189 database and load the annotation files
mysql> create database Gmax_189;
cat loadGene.sql | mysql -u hguser -p Gmax_189 --local-infile=1
  • Note, use your own database username and password.
  • create a file named decorInfo with following contents
gene    phytozome genes \N      2       24      0       http://www.phytozome.net/cgi-bin/gbrowse/soybean/?q=
gc5Base GC percent      \N      5       14      1       \N
  • create a file named makeDb.sql with following content and load it
-- -------------------
--                  --
--  Gmax_189
--                  --
-- -------------------
drop table if exists config;
create table config (
  bbiPath text not null,
  seqPath text null,
  defaultGenelist text null,
  defaultCustomtracks text null,
  defaultPosition varchar(255) not null,
  defaultDataset varchar(255) not null,
  defaultDecor text null,
  defaultScaffold text not null,
  hasGene boolean not null,
  allowJuxtaposition boolean not null,
  keggSpeciesCode varchar(255) null,
  information text not null,
  runmode tinyint not null,
  initmatplot boolean not null
);
insert into config values(
"/srv/epgg/data/data/subtleKnife/Gmax_189/",
"/srv/epgg/data/data/subtleKnife/seq/Gmax_189.gz",
\N,
"{}",
"Gm01,14872,Gm01,374047",
"mock",
"gene,gc5Base",
"Gm01,Gm02,Gm03,Gm04,Gm05,Gm06,Gm07,Gm08,Gm09,Gm10,Gm11,Gm12,Gm13,Gm14,Gm15,Gm16,Gm17,Gm18,Gm19,Gm20",
true,
true,
"Gmax",
"Assembly version|Gmax_189|Sequence source|<a href=ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/ target=_blank>phytozome</a>|Date parsed|August 21, 2013|Chromosomes|20|Misc|1148|Total bases|973,344,380|
0,
false
);


drop table if exists gfGrouping;
create table gfGrouping (
  id TINYINT not null primary key,
  name char(50) not null
);
insert into gfGrouping values (2, "Genes");
insert into gfGrouping values (5, "Others");



drop table if exists decorInfo;
create table decorInfo (
  name char(50) not null primary key,
  printname char(100) not null,
  parent char(50) null,
  grp tinyint not null,
  fileType tinyint not null,
  hasStruct tinyint null,
  queryUrl varchar(255) null
);
load data local infile 'decorInfo' into table decorInfo;

drop table if exists track2Label;
drop table if exists track2ProcessInfo;

drop table if exists track2BamInfo;

drop table if exists track2Detail;
create table track2Detail (
  name varchar(255) not null primary key,
  detail text null
);
load data local infile 'track2Detail_decor' into table track2Detail;

drop table if exists track2Style;
create table track2Style (
  name varchar(255) not null primary key,
  style text not null
);
load data local infile "track2Style" into table track2Style;


drop table if exists tempURL;
create table tempURL (
  session varchar(100) not null,
  offset INT unsigned not null,
  urlpiece text not null
);

drop table if exists scaffoldInfo;
create table scaffoldInfo (
  parent varchar(255) not null,
  child varchar(255) not null,
  childLength int unsigned not null
);
load data local infile "scaffoldInfo" into table scaffoldInfo;


drop table if exists cytoband;
create table cytoband (
  id int null auto_increment primary key,
  chrom char(20) not null,
  start int not null,
  stop int not null,
  name char(20) not null,
  colorIdx int not null
);

cat makeDb.sql | mysql -u hguser -p Gmax_189 --local-infile=1
  • add Gmax_189 to our genome list by editing file
/srv/epgg/data/data/subtleKnife/treeoflife

Get help

  • Adding a genome is a little bit complex, if you get stuck at any step, don't hesitate to contact us through Users community at Google+, or Facebook.