A comprehensive benchmark of RNA–RNA interaction prediction tools for all domains of life

Here comes the first review of scientific paper. Article was written by Sinan Uğur Umu and Paul P. Gardner and published in Bioinformatics (Oxford Journals, https://doi.org/10.1093/bioinformatics/btw728).

The paper, like practically all bioinformatic papers, is highly interdisciplinary. For full understanding, you need to be familiar with basics of biological problems, mathematical and especially statistical modeling and sometimes, but not always, some programming experience. Programming experience is useful to use the software which usually is not very user-friendly and usable for average biologist. But going back to the article…

This paper provides comprehensive comparison between multiple RNA-RNA interaction software. Authors prepared script automatically testing selected software and scoring their ability to find validated interactions in contrast to control sequence pairs. Interestingly, they used RNAs from various organisms what highly influenced performance of multiple RNA-interactions predictions. Similarly, various types of RNAs (especially non-coding RNAs) generated different results while tested with various software what was attributed mostly (but not only) to length of the analyzed target sequence. Interestingly, some RNA-RNA interactions, that are reliably verified experimentally to be present and biologically active, were not predicted by any of the used software. Therefore, relying only on computational predictions (and we see such a tendency in nowadays research) can lead to omission of biologically functional interaction.

In the table below there are gathered software that were analyzed in the paper (more software was considered, however for various reasons they were not eligible).

Tools

Type

AccessFold

MFE

bifold

MFE

bistarna

MFE

DuplexFold

MFE

IntaRNA

MFE

Pairfold

MFE

RactIP

Integer

RIsearch

Alignment-like

RNAcofold

MFE

RNAduplex

MFE

RNAhybrid

MFE

RNAplex

MFE

RNAup

MFE

Ssearch

Alignment

NUPACK

MFE

Sensitivity and precision values were calculated for each software presented, as well as Matthews correlation coefficient which combined information from these two values and significance test using bacterial data set of sequences. Overall analysis selected three algorithms (RNAup, IntaRNA and RNAplex ) to be most sensitive and precise. As expected sensitivity and precision did not correlate with time efficiency. For eukaryotic sets, what is most interesting for me, MFE-based IntaRNA, RNAup had comparable results with alignment-based algorithms: RNAplex and RIsearch.

None of supplementary tables were accessible from Bioinformatics journal page but you can find them on the github page of the project. Tables are clear and what is more – they are easily reusable if anyone has interesting concept of RNA-RNA interactions analysis in his head. As for the code on the github, it was understandable, however, some traditional docstrings to explain each Python function would be nice. All together github code and data are well organized and easy to go through. Supplementary figure 1 that was available at Bioinformatics page shows what authors assigned as true positive, false positive and false negative values.

Summing up, it was a real pleasure to read this paper, it was easy to understand, enough background was given, figures and tables were clear and informative. Moreover, conclusions include some important thoughts that indeed may be taken under consideration while choosing right software for our scientific problem.

Have a nice reading!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s