This page contains the information you need to get SGD Lite running on your own machine. In future versions of SGD Lite, we hope to improve upon the installation process, so please send us any suggestions about this or any other aspect of SGD Lite.
There are two options for installation. The first option (Simple installation of a yeast genome database) is the simplest; use it if you simply want command-line access to yeast data in a PostgreSQL database. The second option (Complete installation of the gmod package) is more involved and should be used if you want a complete installation of the gmod package, including loading scripts and the Gbrowse genome map viewer.
The following describes how to install a local copy of
the SGD Lite database itself. When examples are
given, the user name is postgres, the data file
is sgdlite.sql.gz, and the database name
is sgdlite. Note that the data file was generated
from the SGD
Lite database that we are running.
Create an OS-level user named postgres to own the
databases and initiate the database server following the
PostgreSQL installation procedures.
Download and install the PostgreSQL Software. Extensive installation instructions can be found within the PostgreSQL docs pages. One example of how to do this on the Mac OS_X platform can be found here (kindly provided by Gail Binkley at SGD) or from the Apple developer's page.
Make a directory for your database and cd to it:
postgres@server:~/$ mkdir sgdlite postgres@server:~/$ cd sgdlite
Add the language plpsql to
the template1 database. For example:
postgres@server:~/sgdlite$ createlang plpgsql template1
Create a new database. For example:
postgres@server:~/sgdlite$ createdb sgdlite
Download the sgdlite data file (click here to download it, or use curl on the
command line as shown below) and load it into the new
database. After downloading it, uncompress it:
postgres@server:~/sgdlite$ curl -O http://sgdlite.princeton.edu/download/sgdlite/sgdlite.sql.gz postgres@server:~/sgdlite$ gunzip sgdlite.sql.gz
then use it to load the database:
postgres@server:~/sgdlite$ psql -a -d sgdlite -f sgdlite.sql -o sgdlite_load.log
The options used above are as follows:
Once the data are loaded, you can access and query the database via command line psql. The following example illustrates how to start a psql session, perform a simple query, then quit psql:
postgres@server:~/sgdlite$ psql -U postgres sgdlite
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
sgdlite=# SELECT f.name,f.uniquename,t.name
FROM feature f, cvterm t WHERE f.type_id=t.cvterm_id
AND f.name = 'YFL039C';
name | uniquename | name
---------+------------------------------------------+------
YFL039C | YFL039C_gene_chrVI:53259..54695 | gene
YFL039C | YFL039C_intron_intron_chrVI:54377..54685 | intron
YFL039C | YFL039C_CDS_CDS_chrVI:54685..54695 | CDS
YFL039C | YFL039C_CDS_CDS_chrVI:53259..54377 | CDS
(4 rows)
sgdlite=# \q
postgres@server:~/sgdlite$Download the gmod_alpha_0.003 GMOD package from the GMOD web site or from here.
Follow the excellent detailed instructions for installation (the INSTALL file) included within the gmod_alpha_0.003/ package provided by Scott Cain. The general procedure is:
Download and install the required Perl modules as described in the INSTALL file (depending on what you currently have installed, this will likely be the most time consuming step).
Download and install the PostgreSQL software. Extensive installation instructions can be found within the PostgreSQL docs pages. One example of how to do this on the Mac OSX platform can be found here (kindly provided by Gail Binkley at SGD).
Create the chado database schema running the perl scripts included with the GMOD package as described in the INSTALL file.
Download and apply the patch for this GMOD release.
Load the relationship, sequence (SO and SOFA), and gene ontology (GO) files as described in the INSTALL file.
If loading a GFF file from SGD or SGD Lite or any other file that contains features of type 'transposable_element_gene', you will need to manually insert a new feature type, transposable_element_gene, in the cvterm table. To do this:
Insert the required DBXEF for the new cvterm:
insert into dbxref (db_id,accession) values
( (select db_id from db where name='Sequence Ontology Feature Annotation'),
(select accession from dbxref where db_id in
(select db_id from db where name='Sequence Ontology') and dbxref_id in
(select dbxref_id from cvterm where name='transposable_element_gene')));
Insert the row into the cvterm:
insert into cvterm (cv_id, name, definition, dbxref_id) values
( (select cv_id from cv where name= 'Sequence Ontology Feature Annotation'),
'transposable_element_gene',
(select definition from cvterm where name = 'transposable_element_gene'),
(select dbxref_id from dbxref where db_id in
(select db_id from db where name='Sequence Ontology Feature Annotation') and
accession in
(select accession from dbxref where dbxref_id in
(select dbxref_id from cvterm where name='transposable_element_gene'))));
Load yeast GFF file(s) using the gmod_load_gff3.pl script. A bulk loading script is also provided, which loads the data much faster, but we had trouble getting that script to load the yeast GFF file that we used. To load the same data that are contained in SGD Lite, use the yeast.gff. For the most recent yeast data available, go to the SGD ftp site and use the saccharomyces_cerevisiae.gff file, which contains the main chromosomal features, GO annotations, etc.
If you would like to also install the Gbrowse map viewer, follow the instructions provided with the Gbrowse tar.
See the documentation for the chado modular schema on the gmod web site for more detailed information about the database schema. Some basic information is provided below to help get you started.
Chromosomal features are found in the feature table. The type_id indicates the type of feature; join the type_id with the cvterm_id in the cvterm table to get the text of the types. For example, to retrieve all the tRNAs in yeast, do the following query:
SELECT f.name, t.name from feature f, cvterm t WHERE t.cvterm_id = f.type_id AND t.name = 'tRNA';
The chromosomal coordinates for the chromosomal features are stored in the featureloc table.
GO annotations are stored in the feature_cvterm table. Example: to retrieve all the GO biological process annotations for the yeast ORF YFL039C, you can use the following sql statement:
SELECT DISTINCT f.name, t.name FROM feature f, feature_cvterm fc, cvterm t, cv c WHERE f.feature_id = fc.feature_id AND f.name = 'YFL039C' AND fc.cvterm_id = t.cvterm_id AND c.cv_id = t.cv_id AND c.name = 'Biological Process (Gene Ontology)' ;>>>>>>> 1.7