Sergey Feranchuk, Ulyana Potapova, Vladimir Potapov, Dmitry Mukha, Vladimir Nikolaev, Sergei Belikov

A newly developed web-site is presented. The site is designed to simplify several of the most typical bioinformatics routines. Services available from the site include (but are not limited to) homology modelling, multiple alignment and homology search. There are also several simple but useful utility services to simplify pipelines from one task to another. The architecture of the system allows the calculations themselves to be distributed to the servers that are connected to the system. Services are provided free of charge, without registration, and the user’s privacy is taken care of.

The volume of biological data is increasing drastically, so the question arises: how to unite all the complex and diverse sets of bioinformatics algorithms in a convenient manner. This question is not yet completely answered; there are several competing concepts. Web-portals that integrate several different services are probably the most traditional answer to the problem.
  The work on the project has its roots in the early 90's when one of the authors participated in the development of GeneBee web server in Moscow [1]. Since those times a lot of new technologies have appeared in web development, in bioinformatics algorithms and in infrastructure. Nevertheless the main problem remains: biologists who do not have a deep understanding of computer technologies still need a simple and convenient user interface to solve their tasks.
  The concepts of software-as-a-service and distributed computing have now become even more popular than when the first bioinformatics web-portals were developed. So we believe that the idea is still alive and that the new technologies make its realization even simpler. New frameworks for implementing web-based user interfaces are available, and there are libraries like bioruby [2] to simplify the handling of biological data. Such technologies have similar ideology to the RESR and SOAP concepts of bioinformatics servers (for example, [3]). So it seems worthwhile to make efforts to develop a single convenient web-site which can integrate some of the available software and servers.

The starting point in the development was a typical bioinformatics pipeline: a homology search in PDB, an alignment and a homology modelling of the query protein. The homology search in PDB was implemented on the site as follows: one iteration of Blast in the non-redundant protein bank, then the screening of the PDB primary sequences by the original GeneBee algorithm using the profile from the first iteration. For the multiple alignment there is a choice between the GeneBee alignment algorithm and Clustal (fig. 1). For the homology modelling the Nest program from the Jackal package [4] was adopted. Utilities include conversion between different alignment formats using the Readseq program, and calculations of sequence identity using Bioruby. As the development of utilities of this kind doesn’t require too much effort, we hope to extend the list of utilities according to the requirements of end users. A set of additional hyperlinks is included in the user interface of the site to simplify the completion of popular pipelines.
  At the moment the additional services on the site include, among others, homology search in the non-redundant protein bank using GeneBee screening, and detection of similar motifs in a given set of sequences.

Medline Search
Another feature of the site is a set of advanced tools for searching the Medline database of biomedical abstracts. First of all, a basic search is realized for abstracts in which all words of the query are present. Secondly, an extra linguistic analysis of the database is performed and the phrases with high semantic weight (mems) are selected. Mems are intended to simplify the database lookup. Abstracts are sorted by statistical criteria derived from the frequency of word combinations. The texts in the database are parsed and those abstracts where the words of the query are syntactically linked are placed at the top of the result list.
   In the development of this search tool some technical limitations were met. The computational power of the server is not high enough to process the whole database, so only the most recent abstracts (several years) are processed and included in the search.

The web-interface of the site is implemented using the RubyOnRails framework and the bioruby library. The servers where data processing is implemented are physically distinct from the web-interface server and communication is performed using php and shell scripts by REST ideology.
  The Genebee multiple alignment and screening algorithms were described when they were developed [5], [6]. The nutshell of algorithms consists of a search for local homologies (motifs) of optimal length. The motifs are then connected to clusters; this allows gaps in the alignment to be considered. The alignment can then be improved using dynamic programming techniques.
  The search for optimal respective positions between query sequence and the sequence from the databank is a bottleneck in the performance of the homology screening. So, to optimize execution time, the use of GPU acceleration in this part of the program has been worked on.
  Another improvement is that the profile search is integrated into the screening algorithm. The profile is constructed from a previous iteration of the screening or from the results of the psiblast search by an ideology similar to the method of reconstruction of amino acid substitution scores from an alignment.
  Detections of mems in the medline database is similar to the detection of conservative motifs in biological sequences: pairs of abstract titles are compared and similar segments are marked as mems. Later the detected mems are compared with all the abstracts in order to mark where a particular mem can be found in the database.

[1] L.I. Brodsky, V.V. Ivanov, Ya.L. Kalaidzidis, A.M. Leontovich, V.K. Nikolaev, S.I. Feranchuk and V.A.Drachev, “GeneBee-NET:Internet-based server for analyzing biopolymers structure”, Biochemistry, 60, 8, 923-928, 1995.
[2] N. Goto, P. Prins, M. Nakao, R. Bonnal, J. Aerts and T. Katayama “BioRuby: Bioinformatics software for the Ruby programming language” Bioinformatics 26(20): 2617-2619, 2010
[3] S. Pillai, V. Silventoinen, K. Kallio, M. Senger, S. Sobhany, J. Tate, S. Velankar, A. Golovin, K. Henrick, P. Rice, P. Stoehr and R. Lopez “SOAP-based services provided by the European Bioinformatics Institute” Nucleic Acids Research, 33, pp. W25-W28, 2005
[4] Petrey, D., Xiang, X., Tang, C. L., Xie, L., Gimpelev, M., Mitors, T., Soto, C. S., Goldsmith-Fischman, S., Kernytsky, A., Schlessinger, A., Koh, I. Y. Y., Alexov, E. and Honig, B. “Using Multiple Structure Alignments, Fast Model Building, and Energetic Analysis in Fold Recognition and Homology Modeling”. Proteins. 53 Suppl 6:430-5, 2003
[5] Brodsky L.I., Drachev A.L., Gorbalenya A.E., Leontovich A.M., Feranchuk S.I. A novel method of multiple alignment of bio-polymers(MA-Tools module of GeneBee package), 1993, Biosystems, 30,65-79.
[6] Nikolaev V.K., Leontovich A.M., Drachev V.A., Brodsky L.I. Building multiple alignment using iterative analyzing biopolymers structure dynamic improvement of the initial motif alignment, 1997, Biochemistry, 62,6,578-582.

(c) Bioinformatics Server [beta]