Overview of Galaxy on Bio-Linux 8
Note – there is a working version of Galaxy on Bio‑Linux 7, but to get all the features mentioned here and to get Galaxy updates you must be using Bio‑Linux 8.
Galaxy is the web-based analysis and workflow environment originally created by BX at Penn State, and the Biology and Mathematics and Computer Science departments at Emory University. It is now extended and maintained by an international collaboration of developers. The home page for Galaxy is http://galaxyproject.org/.
Quick Start
Bio-Linux 8 comes with Galaxy installed and running. Use the Galaxy application icon in the Dash to open a session in your default web browser. By default, Galaxy server will appear at http://localhost:8080. Once you have set up your own account within Galaxy, open a terminal and run:
sudo galaxy-add-administrator you@foo.com
Galaxy uses e-mail addresses for account names, so substitute the same e-mail address you just set up in Galaxy into the command above. After setting the admin account, Galaxy takes a few seconds to restart and you should then be able to see the administration tab when you log in to the Galaxy web interface. Now see the standard Galaxy admin docs (https://wiki.galaxyproject.org/Admin) and tutorials (https://wiki.galaxyproject.org/Learn) for where to go next.
Administering Galaxy on Bio-Linux
Differences compared to a “standard” installation
Galaxy on Bio-Linux is installed by the native APT package manager (aka. the Software Center). To make this work, I’ve had to rearrange the Galaxy system slightly and move some files around, but everything is linked from the main installation directory /usr/lib/galaxy-server. For obvious reasons you cannot manually upgrade Galaxy by following the standard Galaxy instructions because APT is managing all the files in this directory. If for some reason you want to install your own Galaxy version you can put this in a separate location. A manually installed server will still benefit from the tools packages provided on Bio-Linux.
universe_wsgi.ini configuration
Where the standard Galaxy documentation talks about universe_wsgi.ini, Galaxy on Bio-Linux uses a configuration directory rather than a single configuration file. All .ini fragments in /etc/galaxy-server/universe_wsgi.d are combined together to create the server configuration. You should not normally edit any existing files under /etc/galaxy-server/universe_wsgi.d because these are updated by the package manager, but rather make a new file and add your local options in there. Note that each configuration file fragment must contain appropriate section headers, which normally just means the first line of any file should be [app:main]. Peek at the existing files to see how this works.
Helper scripts specific to Bio-Linux
The Galaxy packages on Bio-Linux include some helper scripts:
- /usr/sbin/galaxy-add-administrator
Administrators on galaxy are normally set by editing the universe_wsgi.ini file and restarting Galaxy. Due to the split configuration (see above) administrators are set in a configuration file called /etc/galaxy-server/universe_wsgi.d/11_admin_users.ini. You can edit this by hand or use the helper script by running sudo galaxy-add-administrator <account_name>as described above.
- /etc/init.d/galaxy -> /etc/init/galaxy.conf
This is the script that handles starting and stopping the Galaxy server. You can’t run it directly, but you can use it to stop and start the server like so:
sudo service galaxy stop or… sudo service galaxy start or… sudo service galaxy restart
- /usr/lib/galaxy-server/goto_galaxy.sh
This is what runs when you click the Galaxy icon. It brings up Galaxy in your default browser.
- /usr/lib/galaxy-server/scripts/cleanup_everything.sh
A script to help you clean up stale data on the server as described at https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories%20and%20Datasets. At present you need to run this manually, or set it to run in the crontab, but in future it should run automatically.
Galaxy logs
All server activity gets logged in /var/log/galaxy.
Galaxy packages and tools available
Basic Packages
On a default Bio-Linux 8 installation, you will have the galaxy-server, galaxy-server-all and galaxy-tools-bl packages installed. These provide a basic Galaxy configuration along with a standard suite of bioinformatics tools. You can add more tools to Galaxy via the Tool Shed feature (see main Galaxy documentation).
If you don’t have the standard packages for some reason you can add them like so:
sudo apt-get install galaxy-server-{all,tools-bl}
tail -f /var/log/galaxy/galaxy-server.log
Wait until you see “serving on http://127.0.0.1:8080” then hit Ctrl-C to get back to the shell. This may take a while on the first start-up as galaxy has to construct a new database.
galaxy-server-pg-database
This add-on package tries to automatically set up a PostgreSQL database for Galaxy to use. If the package scripts are unable to set this up they will quit with an error, and the update manager will complain that the system is in an inconsistent state. If this happens, you should purge the package “dpkg -P galaxy-server-pg-database” and set things up manually. This is not very difficult and is explained fully in the file README.pg-database.
Using a PostgreSQL backend is recommended if you have multiple users or complex workflows, but be aware that if you do a major system upgrade you will have to ensure the PostgreSQL databases are migrated. See /usr/share/doc/postgresql-common/README.Debian.gz.
When the package is successfully installed it will switch over to PostgreSQL and when removed it will switch back to SQLite. Any data you already had in Galaxy will seem to vanish as Galaxy only sees files that are logged in the database, but in fact nothing is removed. There is no easy way at present to migrate databases from one system to the other, so you will need to re-import all your data to Galaxy after switching to PostgreSQL.
galaxy-server-apache-proxy
This tries to automatically set up the Apache web server as a proxy for Galaxy. For a “proper” Galaxy server this is the recommended option along with use of the PostgreSQL database, but like the above package it makes the system a little more complex. The main practical effect of installing this package is that Apache is able to authenticate users based on system accounts, so these are then used for Galaxy – ie. you’ll need to log in to Galaxy with your regular username and password. Galaxy will append @localhost to your user name to give the internal Galaxy account name.
To control access to Galaxy, accounts on the system need to be added to the galaxy group, so to add yourself:
sudo usermod -aG galaxy $USER
And don’t forget to make yourself an admin:
sudo galaxy-add-administrator $USER
Because Galaxy now has a direct mapping between system users and Galaxy users, I was now able to implement the FTP upload feature securely (https://wiki.galaxyproject.org/Admin/Config/Upload%20via%20FTP). Rather than using a special FTP server, any user in the Galaxy group will be provided with a transfer directory under /var/lib/galaxy-server/transfer and anything dropped in there will show up as something that Galaxy can ingest via the Upload File option. Files can simply be copied here, or uploaded from a remote machine using any SFTP client.
To simplify remembering where to put the files, you may want to make a symlink in your home directory like so:
ln -s /var/lib/galaxy-server/transfer/$USER@localhost ~/galaxy-transfer
Last but not least, when this package is in use you’ll need to access Galaxy via the address http://localhost/galaxy/. If you try to talk directly to the server on port 8080 it will complain that no external user authentication info has been provided. On the local machine, the Galaxy icon will work out the correct page to bring up for you. For remote connections, you can now control external access to Galaxy via the firewall settings for port 80, but see security considerations below.
Security Considerations
Serving Galaxy to other machines
The default installation just lets local users on the Bio-Linux machine see Galaxy. If you want to serve to other machines you are recommended to use the galaxy-server-apache-proxy package and then enable only trusted machines to access port 80 in the firewall settings. Don’t open Galaxy to the wider internet unless you are really confident you know what you are doing security-wise.
A more selective solution than opening ports is to use a remote x2go desktop or ssh tunnelling. See the file /usr/share/doc/galaxy-server-apache-proxy/README.apache2-setup.gz for details and other options.