BOSCO logo


Computational Genomics Platform

Overview

question Questions
  • Please address your questions by e-mail to any of the contributors (e-mails at the bottom of this wiki).

objectives Objectives
  • To design, organize and mantain the Computational Genomics Platform at IRCCS Azienda Ospedaliero-Universitaria di Bologna (AOUBO)

  • To develop, deploy and update a large range of bioinformatic tools for the analysis of genomic data, and to organize them in software environments and analysis workflows

  • To provide support for the developments of bioinformatic software and for the design of genomic analysis

  • To curate a genomic variant database populated by the genetic variation identified by IRCCS AOUBO projects and/or collaborations

last_modification Last modification: Nov 22, 2022

The Computational Genomics Platform (https://www.aosp.bo.it/content/genomica-computazionale) is an integrated system of bioinformatic solutions designed and mantained by the Bologna Sant’Orsola Computational Genomics (BOSCO) team at IRCCS AOUBO. The platform offers diversified solutions for the analysis of genomic data:

  • Command Line Interface (CLI) to run bioinformatic tools. The systems through which the users interact with the CLI are mainly:
    • Slurm, the resource management and job scheduling system
    • Snakemake, the workflow management system
    • Conda, the package and environment management system
  • Galaxy to let the users with no or poor programming experience to carry out computational genomic projects in a user-friendly web portal
  • OpenCGA to organize genomic projects in a database for easily storing and querying variant datasets
  • GitLab to support bioinformatic software development.

Agenda

In this documentation, you can find:

  1. CLI
    1. Mantained CLI workflows
  2. Galaxy
    1. Mantained Galaxy workflows
  3. Bioinformatic tools
  4. OpenCGA
  5. Gitlab

CLI

In the CLI, the users can run on their own any of the currently available bioinformatic tools as listed below, where the conda environment they belong to is also indicated.

Mantained CLI workflows

The BOSCO team also constructs and mantains some general purpose analysis workflows which can be launched in the CLI:

Workflow snakemake for WES data pre-processing and germline short variants calling, SNV and indels
Workflow snakemake for WGS data pre-processing and germline short variants calling (SNV and indels) UNDER CONSTRUCTION
Workflow snakemake for BAM conversion to CRAM UNDER CONSTRUCTION

Galaxy

In Galaxy, the users can run on their own any of the currently available bioinformatic tools as listed below, where their accessibility in Galaxy is indicated.

To launch the Galaxy aosp instance browse to Galaxy aosp instance galaxy.aosp.biodec.com. Click the Log in or register link (top panel) and enter your email and password.

comment Galaxy

Galaxy is an open-source, web-based portal for accessible, reproducible, and transparent computational research. As a first step with Galaxy visit the page https://galaxyproject.org/get-started/. A collection of tutorials developed and maintained by the Galaxy community is available at https://training.galaxyproject.org/training-material. To view the list of tools that can be used within the Galaxy instance visit https://toolshed.g2.bx.psu.edu/.

Mantained Galaxy workflows

The BOSCO team also constructs and mantains some general purpose analysis workflows which can be launched in Galaxy:

  • Galaxy workflow to run Rabdomyzer tool
  • [Galaxy workflow for the analysis of amplicon-based gene panel](https://git.aosp.biodec.com/aosp/piattaforma-bioinformatica/-/wikis/Galaxy-workflow-for-gene-pane

Bioinformatic tools

Tool Version Galaxy Commandline Conda Environment
ensembl-vep 101.0 vep
bamkit 16.07.26 svtools
bamsurgeon 1.2 bamsurgeon1.2
bcftools 1.9 aligners
bamsurgeon1.2
bedtools 2.27.1
2.30.0
aligners
sv2
blast 2.10.1 available in Tool Shed svtools
bwa 0.7.17 aligners
bamsurgeon1.2
svtools
clump 1.0.0 clump
cnvfilter 1.6.0 cnvfilter
cnvkit 0.9.7 cnvkit
cnvpytor 1 cnvpytor
CoNIFER 0.2.2 conifer
DECoN 1.0.2 decon
delly 0.8.5 available in Tool Shed svtools
ensembl-vep 101 available in Tool Shed vep
erds 1.1 erds
excavator2 2.0.0 singularity
exomiser   exomiser
exonerate 2.4.0 available in Tool Shed bamsurgeon1.2
fastp 0.20.1 quality
fastqc 0.11.8 aligners
quality
gatk 4.1.2.0
3.8
gatk4
gatk3
gridss 2.10.1 svtools
IntegrationSiteMapper 1.3.8 discvrseq
kraken2 2.1.0 available in Tool Shed svtools
lumpy-sv 0.3.1 available in Tool Shed svtools
manta 1.6.0 available in Tool Shed svtools
mipgen 4 mipgen
mosdepth 0.3.1 quality
multiqc 1.9 aligners
quality
pear 0.9.6 available in Tool Shed pear
picard 2.18.14 aligners
pybedtools 0.8.1 available in Tool Shed sv2
python 2.7.15 bamsurgeon1.2
r 3.5.1 available on galaxy.eu.org rstudio
r-exomedepth 1.1.15 available in Tool Shed decon
sambamba 0.7.1 available in Tool Shed svtools
samblaster 0.1.26 available in Tool Shed svtools
samtools 0.1.19
1.1
1.9
1.9
erds
svtools
bamsurgeon1.2
aligners
singularity 3.6.3 singularity
snakemake 7.3.8 snakemake
somatic-sniper 1.0.5.0 available in Tool Shed bamsurgeon1.2
survivor 1.0.7 svtools
sv2 1.4.3.4 sv2
svaba 1.1.0 svaba
svtyper 0.7.1 available in Tool Shed svtools
svviz2 2.0a3 svviz2
t_coffee 11.0.8 available in Tool Shed vep
uncoverapp 1.6.0 in progress uncoverapp
VariantQC 1.3.8 discvrseq
varscan 2.4.3 available in Tool Shed (iuc) bamsurgeon1.2
velvet 1.2.10 available in Tool Shed (devteams) bamsurgeon1.2
vt 2015.11.10 vt
platypus-variant 0.8.1.1 platypus-variant
h3m2     custom

OpenCGA

OpenCGA represents the framework to load and retrieve variation from a genomic database and also provides a data visualization browser and fucntional as well as clinical analysis modules.

comment OpenCGA

OpenCGA is the most advanced big data genomic analysis platform. It is implemented as an open-source project that implements a high-performance, scalable and secure platform for Genomic data analysis and visualisation. OpenCGA implements a complete solution that covers all aspects of genomic analysis: metadata database, authentication and security, variant normalisation and aggregation, variant storage and annotation, highly scalable variant NoSQL storage engine, alignment and coverage, big data variant analysis, RESTful web services, visualisation OpenCGA is developed and maintained in the University of Cambridge and it is currently used by several big data projects such as GEL (Genomics England).

Gitlab

With Gitlab, the BOSCO team builds its own bioinformatic software and supports anyone who wants to do it. It also provides the issue-tracking system to handle the problems encounterd by the users on the Computational Genomics Platform.

comment GitLab

GitLab is a DevOps software package that combines the ability to develop, secure, and operate software collaboratively and in a single application.

Contributors

Bologna Sant’Orsola Computational Genomics (BOSCO) team U.O.C. Genetica Medica, S.S. Genomica Computazionale, IRCCS Azienda Ospedaliero-Universitaria di Bologna (AOUBO) Please address your questions to:

  • Tania Giangregorio - tania.giangregorio@aosp.bo.it
  • Federica Isidori - federica.isidori@aosp.bo.it
  • Tommaso Pippucci - tommaso.pippucci@aosp.bo.it