SGD Lite: download and installation information

<<<<<<< index.html This SGD Lite download page has been retired. SGD Lite has been subsumed into YFGdb. To download data or the sql file, go to the new YFGdb download page. =======

This page contains the information you need to get SGD Lite running on your own machine. In future versions of SGD Lite, we hope to improve upon the installation process, so please send us any suggestions about this or any other aspect of SGD Lite.

GMOD package version: gmod_alpha_0.003
Data downloaded from SGD on: May 29, 2007
Files used for this version:
- 2007_05_29_sgdlite.sql.gz (7.57MB)
- 2007_05_29_yeast.gff.gz (4.59 MB)
- gmod-0.003.tar.gz (2.8 MB)
- 2007_05_29_TF_binding_site.gff (567 KB)
- 2007_05_29_snps.gff.gz (2.78 MB)
All the instructions, and links within, below point to the current file versions. Previous versions are archived.

Installation instructions

There are two options for installation. The first option (Simple installation of a yeast genome database) is the simplest; use it if you simply want command-line access to yeast data in a PostgreSQL database. The second option (Complete installation of the gmod package) is more involved and should be used if you want a complete installation of the gmod package, including loading scripts and the Gbrowse genome map viewer.

Simple installation of a yeast genome database

The following describes how to install a local copy of the SGD Lite database itself. When examples are given, the user name is postgres, the data file is sgdlite.sql.gz, and the database name is sgdlite. Note that the data file was generated from the SGD Lite database that we are running.

Create an OS-level user named postgres to own the databases and initiate the database server following the PostgreSQL installation procedures.
Download and install the PostgreSQL Software. Extensive installation instructions can be found within the PostgreSQL docs pages. One example of how to do this on the Mac OS_X platform can be found here (kindly provided by Gail Binkley at SGD) or from the Apple developer's page.

Make a directory for your database and cd to it:

postgres@server:~/$ mkdir sgdlite
postgres@server:~/$ cd sgdlite

Add the language plpsql to the template1 database. For example:
```
postgres@server:~/sgdlite$ createlang plpgsql template1
```

Create a new database. For example:

postgres@server:~/sgdlite$ createdb sgdlite

Download the sgdlite data file (click here to download it, or use curl on the command line as shown below) and load it into the new database. After downloading it, uncompress it:

postgres@server:~/sgdlite$ curl -O http://sgdlite.princeton.edu/download/sgdlite/sgdlite.sql.gz
postgres@server:~/sgdlite$ gunzip sgdlite.sql.gz

then use it to load the database:

postgres@server:~/sgdlite$ psql -a -d sgdlite -f sgdlite.sql -o sgdlite_load.log

The options used above are as follows:

-a: set echo on
-d: database name
-f: run SQL from the file
-o: spool the output in a file

Once the data are loaded, you can access and query the database via command line psql. The following example illustrates how to start a psql session, perform a simple query, then quit psql:

postgres@server:~/sgdlite$ psql -U postgres sgdlite
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

sgdlite=# SELECT f.name,f.uniquename,t.name
          FROM feature f, cvterm t WHERE f.type_id=t.cvterm_id
          AND f.name = 'YFL039C';
  name   |           uniquename                     | name
---------+------------------------------------------+------
 YFL039C | YFL039C_gene_chrVI:53259..54695          | gene
 YFL039C | YFL039C_intron_intron_chrVI:54377..54685 | intron
 YFL039C | YFL039C_CDS_CDS_chrVI:54685..54695       | CDS
 YFL039C | YFL039C_CDS_CDS_chrVI:53259..54377       | CDS
(4 rows)

sgdlite=# \q
postgres@server:~/sgdlite$

Complete installation of the gmod package

Download the gmod_alpha_0.003 GMOD package from the GMOD web site or from here.
Follow the excellent detailed instructions for installation (the INSTALL file) included within the gmod_alpha_0.003/ package provided by Scott Cain. The general procedure is:
1. Download and install the required Perl modules as described in the INSTALL file (depending on what you currently have installed, this will likely be the most time consuming step).
2. Download and install the PostgreSQL software. Extensive installation instructions can be found within the PostgreSQL docs pages. One example of how to do this on the Mac OS X platform can be found here (kindly provided by Gail Binkley at SGD).
3. Create the chado database schema running the perl scripts included with the GMOD package as described in the INSTALL file.
4. Download and apply the patch for this GMOD release.
5. Load the relationship, sequence (SO and SOFA), and gene ontology (GO) files as described in the INSTALL file.
6. If loading a GFF file from SGD or SGD Lite or any other file that contains features of type 'transposable_element_gene', you will need to manually insert a new feature type, transposable_element_gene, in the cvterm table. To do this:
  
  Insert the required DBXEF for the new cvterm:
```
insert into dbxref (db_id,accession) values
 ( (select db_id from db where name='Sequence Ontology Feature Annotation'),
   (select accession from dbxref where db_id in
       (select db_id from db where name='Sequence Ontology') and dbxref_id in
       (select dbxref_id from cvterm where name='transposable_element_gene')));
```
  Insert the row into the cvterm:
```
insert into cvterm (cv_id, name, definition, dbxref_id) values
 ( (select cv_id from cv where name= 'Sequence Ontology Feature Annotation'),
   'transposable_element_gene',
   (select definition from cvterm where name = 'transposable_element_gene'),
   (select dbxref_id from dbxref where db_id in
       (select db_id from db where name='Sequence Ontology Feature Annotation') and
       accession in
       (select accession from dbxref where dbxref_id in
             (select dbxref_id from cvterm where name='transposable_element_gene'))));
```
7. Load yeast GFF file(s) using the gmod_load_gff3.pl script. A bulk loading script is also provided, which loads the data much faster, but we had trouble getting that script to load the yeast GFF file that we used. To load the same data that are contained in SGD Lite, use the yeast.gff. For the most recent yeast data available, go to the SGD ftp site and use the saccharomyces_cerevisiae.gff file, which contains the main chromosomal features, GO annotations, etc.
8. If you would like to also install the Gbrowse map viewer, follow the instructions provided with the Gbrowse tar.
  
  Gbrowse troubleshooting and tips:
  
  Gbrowse config file
  
  if you are having trouble getting certain features displayed, you should check your Gbrowse config file; you might just need to make a trivial change in the config file (config file that is used by SGD Lite).
  
  Chromosomal coordinates
  
  note that the gmod_load_gff3.pl script decrements the start coordinate in the gff file when it loads the feature location information into the featureloc table i.e. the value in the fmin column will be one less than the start coordinate in the gff file. This is because the database schema expects interbase coordinates (counts between bases, starting from 0) while GFF, Bioperl, etc, counts the actual bases. The Gbrowse program compensates for this appropriately. Just be warned that the fmin coordinates in the database are one less than the data in your gff file.

Notes on how the data are stored in the database schema

See the documentation for the chado modular schema on the gmod web site for more detailed information about the database schema. Some basic information is provided below to help get you started.

chromosomal features

Chromosomal features are found in the feature table. The type_id indicates the type of feature; join the type_id with the cvterm_id in the cvterm table to get the text of the types. For example, to retrieve all the tRNAs in yeast, do the following query:

SELECT f.name, t.name from feature f, cvterm t
WHERE t.cvterm_id = f.type_id
AND t.name = 'tRNA';

The chromosomal coordinates for the chromosomal features are stored in the featureloc table.

GO annotations

GO annotations are stored in the feature_cvterm table. Example: to retrieve all the GO biological process annotations for the yeast ORF YFL039C, you can use the following sql statement:

SELECT DISTINCT f.name, t.name
FROM feature f, feature_cvterm fc, cvterm t, cv c
WHERE f.feature_id = fc.feature_id
AND f.name = 'YFL039C'
AND fc.cvterm_id = t.cvterm_id
AND c.cv_id = t.cv_id
AND c.name = 'Biological Process (Gene Ontology)' ;

>>>>>>> 1.7

Jump to:

Version information