Notes on installing gmod 0.002 alpha on Mac OS 10.3

Contents

Installation

Data loading: the gene and sequence ontologies and a gff3 file

Installation

Fan Kang, our database administrator at the SGD Princeton colony, took these excellent notes on installing gmod 0.002 alpha (and the required perl modules) on her G4 desktop running 10.3:

Perl module installation
chado installation
Gbrowse installation: note that Fan successfully installed Gbrowse, but did not go on to configure it to read from the chado schema because we have now moved on to doing the installation on a server.

Suggestions for installation improvements:

If there was an error during "perl Markfile.PL", you get the following error for the 2nd attempt:

...
Creating new 'Build' script for 'Chado' version '0.01'
lib/Chado/LoadDBI.pm -> blib/lib/Chado/LoadDBI.pm
lib/Chado/AutoDBI.pm -> blib/lib/Chado/AutoDBI.pm
Can't copy('lib/Chado/AutoDBI.pm', 'blib/lib/Chado/AutoDBI.pm'): Permission denied at /usr/local/perl/5.80/Library/Module/Build/Base.pm line 2296.
...

solution: remove blib/lib/Chado/AutoDBI.pm manually. I think this file should be replaced automatically during the 2nd try. A possible bug?

The following parameters are loaded into AutoDBI.pm and LoadDBI.pm:
dbname, dbuser, dbpass.

A user needs root privileges to edit the above modules if he/she wanted to load the data into the 2nd database on the same server or the password was changed. It would be good if this could be made more flexible.

a list of chado modules:

fkang@bouzouki:/usr/local/perl/5.80/Library/Chado$ ls -l
total 376
-r--r--r--  1 root  wheel  77919 19 Apr 17:24 AutoDBI.pm
-r-xr-xr-x  1 root  wheel  77919 15 Apr 13:39 AutoDBI.pm.old*
-r--r--r--  1 root  wheel  12753 29 Mar 12:47 Builder.pm
-r--r--r--  1 root  wheel   2554 29 Mar 12:47 Config.pm
-r--r--r--  1 root  wheel    245 19 Apr 17:24 LoadDBI.pm
-r--r--r--  1 root  wheel    261 29 Mar 13:58 LoadDBI.pm~

A potentially more flexible way to do the installation:
1. provide a list of SQL scripts for fundamental database setup and data loading.
2. use perl script(s) to generate a SQL script (insert statements...) for loading each ontology or gff3 file.
3. load the data directly to postgres database using "psql" utility with the file generated in step 2.
I think this would give the DBA and users more flexibility to control the database and see what's going on during each step. Also, this would make the data loading more transparent and thus more easy to troubleshoot.

Data loading

Used the loading scripts that came with gmod 0.002 alpha:

gmod_load_ontology.pl
gmod_load_gff3.pl

Gene ontology

Initial attempt when running 'make ontologies': DBI connect error (see install notes for details).
Problem: several parameters were changed in $PGDATA/pg_hba.conf and $PGDATA/postgresql.conf. it seems the TCP/IP setting didn't take effect with 'pg_ctl reload'.

Solution: shut down and re-start.

Sequence ontology

Problem: so.ontology file was out of sync with the relationship types/predicates hard coded in the global vars section of the loading script (specifically the %oborelmap hash). Basically, a new type, derived_from, was in the so.ontology but gmod_load_ontology.pl was not expecting it (wasn't an element in %oborelmap). The error was something like "cannot insert null into accession".

Solution: just added 'derived_from' to the hash. NOTE: this problem has since been fixed in the cvs on the gmod site!

GFF3 file

Used the GFF3 file in development by Stan for SGD, based on the most recent GFF3 documentation. Changes I made to the GFF3 file to eliminate errors:

replaced "LTR" type (3rd column) with long_terminal_repeat to match the SO term: this is an SGD data issue that we will soon fix.
removed the lines with "landmark" type in the 3rd column: Gbrowse uses this type. For use with GBrowse, I will manually load this type in the next time around (though maybe the load_gff3.pl script can be modified to handle it?)
changed format of ## sequence-region lines to "regular" gff lines with chromosomes, for example, changed:
```
##sequence-region       chrI    1       230210
```
to:
```
chrI	SGD	chromosome	1	230210	.	.	.	ID=chrI
```

Before doing 1. and 2. the error was:

Either the Sequence Ontology was incorrectly loaded,
or this file doesn't contain GFF3 at gmod_load_gff3.pl line 252,  line 34.

Before doing 3., the error was:

Unable to find a source feature id for the reference sequence in this line:
chrII   SGD     gene    646     1128    .       +       0       ID "YBL113W-A"  ; Note "Identified by gene-trapping, microarray-based expression analysis, and genome-wide homology searching"  ; dbxref "SGD:S0028599"  ; orf_classification Dubious

That is, chrII should either have a entry in the 
feature table or earlier in this GFF file and it doesn't.

The modified gff3 file can be found here.

Possible bug?

Wrong ontology used?
After loading the gff3 file, it appears as though the Gene Ontology term was sometimes inappropriately used for the feature types:

sgdlitedev=#  select distinct t.name, f.type_id, t.cv_id, c.name  from feature f, cvterm t, cv c where c.cv_id = t.cv_id and  f.type_id = t.cvterm_id;    
           name            | type_id | cv_id |       name        
---------------------------+---------+-------+-------------------
 ARS                       |   17756 |     4 | Sequence Ontology
 CDS                       |   17947 |     4 | Sequence Ontology
 centromere                |    1330 |     3 | Gene Ontology
 chromosome                |     341 |     3 | Gene Ontology
 gene                      |   17827 |     4 | Sequence Ontology
 long_terminal_repeat      |   17640 |     4 | Sequence Ontology
 ncRNA                     |   17977 |     4 | Sequence Ontology
 non_transcribed_region    |   17830 |     4 | Sequence Ontology
 noncoding_exon            |   17837 |     4 | Sequence Ontology
 rRNA                      |   17979 |     4 | Sequence Ontology
 region                    |   17622 |     4 | Sequence Ontology
 repeat_family             |   17762 |     4 | Sequence Ontology
 retrotransposon           |   17638 |     4 | Sequence Ontology
 snRNA                     |   18007 |     4 | Sequence Ontology
 snoRNA                    |   18019 |     4 | Sequence Ontology
 tRNA                      |   17986 |     4 | Sequence Ontology
 telomere                  |    1329 |     3 | Gene Ontology
 transcribed_spacer_region |   17891 |     4 | Sequence Ontology
 transposable_element_gene |   17828 |     4 | Sequence Ontology

I just manually updated these. NOTE: this problem has since been fixed in cvs on the gmod site.

Suggestions for 0.003 data loading:

if possible, it would be great to include a script that loads GO annotations from gene association files.

last update: