• Question: What are the most common bioinformatics programs?

    Asked by to Claire, Ian, Sergey, Vicky, Zena on 13 Jun 2014. This question was also asked by , .
    • Photo: Sergey Lamzin

      Sergey Lamzin answered on 13 Jun 2014:


      Sadly this question roughly equivalent to “What is the most common Internet Browser?”.
      Let’s divide this into the browser camps:

      The Internet Explorer camp – mostly consists of bench scientists. They will use Illumina BaseScape, PacBio SMRT Analysis Suite and any other software recommended by their hardware vendor.

      The Google Chrome camp – These are biologists who have spent their past five years behind a computer and now call themselves Bioinformaticians. They will use any **free** tool that they read about in recent research papers.

      The Firefox camp – These are the OpenSource geeks who have either spent over 10 years of their past life behind a computer or come from a mathematics of computer science background. They will use the commonly established open source programs bwa. bowtie, velvet in the hopes of one day being able to write a similar tool themselves

      And last but not least
      The curl camp – these are the freaks who have advanced degrees in computer science and are stuck in the previous century. The refuse to use anything that has a graphical user interface and only stare at black screens filled with white characters. Their favourite bioinformatics tools are head, tail, cat, grep, vi and awk (the common UNIX text manipulation tools).

      Bottom line – There is no tool that is consistently used widely. Everyone uses what he/she thinks is right or within their knowledge.

    • Photo: Zena Hira

      Zena Hira answered on 13 Jun 2014:


      There are a few that I have only came across briefly:
      DAVID is an online tool for learning more about the function of genes
      EBI is another one that lets you search for genes or gene sequences same as GENBank.
      There are tools that help you analyze the data normally found in Bioconductor which can be used with a programming language called R that most bioinformaticians use.

      Generally bioinformatics is a big research area so it depends what you do. I haven’t really used many of the online programs since they are mostly written for biologists and I am a computer scientist. However I have used ArrayExpress which is a place that you can find data for analysis.

    • Photo: Vicky Schneider

      Vicky Schneider answered on 16 Jun 2014:


      This is a question lots of people ask, and many postgraduates that come in Bioinformatics courses always look to get an answer for. There are indeed many, but we could start in categorizing first “databases” and “tools”. The databases where following a precise structure and order you store information in such a ways that its easy to retrieve it, find it and ideally compare it with other information. Tools helping you to do lots of things with the information in databases as well as in other places. If you study the relationships between species, evolution, etc you will most probably end up using some “phylogenetics” program for example. In that case there are lots of bioinformatics behind the tools to use to build “phylogenetics trees” (a bit like your ancestry tree) and the actual software design etc…there is one guy that write and produce one of the most popular tools but also created a page trying to categorize the phylogenetics tools for anyone to have a look according to a variety of criteria (e.g. … by methods available
      … by computer systems on which they work
      … cross-referenced by method and by computer system.
      … by ones which analyze particular kinds of data.
      … to show the most recent listings
      … to show ones most recently changed)…have a look here:
      http://evolution.genetics.washington.edu/phylip/software.html

      this is just for those type of tools, imagine that there are plenty of other fields…so great if we had other efforts like this one…places to look for databases and tools to start with are the bioinformatics services providers from organizations such as the European Bioinformatics Institute (EBI) (where I used to work :-), The Swiss Institute of Bioinformatics (SIB) and the NCBI for example, I added the links below:
      EBI
      http://www.ebi.ac.uk/services
      SIB
      databases
      http://expasy.org/resources/search/resource_type:DB
      Tools
      http://expasy.org/resources/search/resource_type:tool
      NCBI
      http://www.ncbi.nlm.nih.gov/
      lots to explore!

    • Photo: Ian Simpson

      Ian Simpson answered on 17 Jun 2014:


      Looks like there have been some good answers on here already. Like all science the method you use depends on the question you are asking.

      There are a core set of skills that people who work in the field of Bioinformatics would normally have or look to acquire quite quickly. These are generally aligned with the common tasks. This is a generalisation, but broadly :-
      1.) acquiring data
      2.) processing/quality control
      3.) data storage
      4.) data analysis
      5.) presentation/visualisation

      People would normally be able to write scripts in one of several common languages that Bioinformaticians use. The two most common are Perl and Python. These can be used in all of the above steps thanks to large software libraries that have been developed (and are mostly open source, so free) by scientists around the world (BioPerl, CPAN and BioPython are examples). Increasingly a lot of the quality control and analysis is being carried out in a statistical programming language called ‘R’ which is also open source and also has a lot of software libraries held in two repositories; CRAN and Bioconductor.

      As Sergey alluded to, there are many ‘command line’ programs. These are nearly always specialist tools that have been developed to carry out very specific types of data analysis and are often written in C, C++ or Java and are commonly faster and more powerful than graphical tools (though that need not always be true). I would say that the majority of Bioinformaticians will script and use command line tools and the reason for that is that most experiments are special or different in some way and require a bespoke solution. A lot of these tools can be executed in different ways and these change the nature of the results produced, so often a good degree of experimentation is required to get the best analysis from your data. It’s an important point to make that Bioinformatics is an experimental science ! I carry out lots of experiments every week without ever entering a wet lab 😯

      As Vicky very nicely described we have to store a lot of information and would like to be able to query that and compare it with other similar data. Most data in Biology is stored in MySQL and Postgres relational database systems. These are the most common DB systems in the world and are used by pretty much every website you’re ever likely to visit (including this one).

      If your question was motivated by what you might want to learn to get a feel for some of these software tools. I would throw you right in at the deep end and suggest you start playing with ‘python’ and ‘R’ and their associated software packages. In the last few years these two have strongly emerged as the workhorses of a lot of Bioinformatics projects and there are some great guides out there to get you started :-

      Quick R
      http://www.statmethods.net/
      http://www.bioconductor.org/

      Learn Python
      http://www.learnpython.org/
      http://biopython.org/wiki/Main_Page

      Finally in visualisation there are lots of widely used software packages to make diagrams and visually-interactive tools. Many of these are accessible from within Python, Perl and R so you don’t have to leave the ‘R’ analysis environment to make nice visualisations. Some popular examples are iGraph (for drawing networks), circos (for genome-scale visualisation), iTOL (visualising evolutionary trees). Many are also scripted as web applications using things like D3-Ajax-JSON http://bl.ocks.org/mbostock/4062045 (please click the link, it’s very cool !).

      So in summary you might have expected a list of programs that are “most used”……BLAST, MUSCLE, Clustal, PHYLIP etc.. but in reality you need to select the ones best suited to what you’re trying to do and you do that by reading the literature, talking to other Bioinformaticians and above all experimenting. A great foundation is to start learning a scripting language (I’ve suggested Python) and an analytical language (I’ve suggested ‘R’).

Comments