R is an increasingly powerful tool in the bioinformatics toolbox, but its command-based interface sets the learning curve rather high, and as a result turns a lot of would-be users off. This post is about installing RStudio, a graphical user interface (or “IDE“, rather) for R, on our High-performance Computing (HPC) cluster used for data analysis at the Livestock Research Institute (ILRI) Nairobi.
As ILRI’s research-computing environment is a cluster of sorts (network-attached storage with several compute nodes), I have to take extra care to install things a bit more “sustainably”; apps are installed in a non-standard prefix globally available to all nodes, apps and their dependencies don’t interfere with system packages, etc. This generally means I have to compile most packages manually rather than relying on pre-packaged versions. This turned out to be a non-trivial task, so for posterity’s sake I decided to write down my experiences.
In systems administration, “dependency hell” is a situation that arises when you attempt to install a piece of software and it depends on another piece of software, which depends on another piece of software… 🙂 Often these dependencies are hard to satisfy, cyclic etc.
In this case, our cluster is running the CentOS 6 GNU/Linux operating system, whose focus is more on long-term and enterprise stability than “oooh, shiny!”, which creates some unique problems when compiling packages with slightly less conservative dependencies. For example, as of this writing:
- RStudio requires boost >= 1.50.0, CentOS 6.4 only has 1.41.0
- RStudio requires Qt >= 4.8.0, CentOS 6.4 only has 4.6.x
… which would be simple enough if we were running a more modern operating system (and weren’t worried about long-term support). Some of the interesting twists encountered during the setup process:
- RStudio needs QtWebKit, which is only compiled as part of Qt if your GCC is new enough (CentOS 6.4 has GCC 4.4.7)
- Qt 4.8.0 refuses to compile with GCC 4.7.2 due to
-Wdelete-non-virtual-dtorwarnings being treated as fatal since GCC 4.7.0
- RStudio wants R to be compiled with
--enable-R-shlibfor libR support
- etc etc…
In the end the process was easy, but it took a lot of trial and error to get everything working properly.
Install dependencies: boost
In order for boost to compile properly, we need to install the following dependencies:
$ sudo yum install python-devel.x86_64 zlib-devel.x86_64 bzip2-devel.x86_64 libicu-devel.x86_64
Download and compile boost itself (RStudio recommends 1.50.0, so let’s do that):
$ wget http://tenet.dl.sourceforge.net/project/boost/boost/1.50.0/boost_1_50_0.tar.bz2 $ tar xf boost_1_50_0.tar.bz2 -C /tmp $ cd /tmp/boost_1_50_0 $ sudo mkdir -p /export/apps/boost/1.50.0 $ sudo chown aorth /export/apps/boost/1.50.0 $ ./bootstrap.sh $ ./bjam --prefix=/export/apps/boost/1.50.0/ variant=release install $ sudo chown -R root:root /export/apps/boost/1.50.0
Note: I extract and compile in /tmp because my home directory is network mounted, and it’s slow as molasses to compile there 😉
Install dependencies: Qt
RStudio requires QtWebKit which, according to Qt’s configuration help, is only compiled alongside Qt if a new enough version of GCC is used. We can get GCC 4.7.x from Red Hat’s Developer Toolset 1.1, but that causes a problem because Qt 4.8.0 triggers some fatal warnings in GCC 4.7.x. Luckily, Qt 4.8.4 has fixed these issues, so we terminate the cycle of dependency hell there 😉
First, enable the devtoolset-1.1 repository and install a newer gcc-c+:
$ wget http://people.centos.org/tru/devtools-1.1/devtools-1.1.repo -O /etc/yum.repos.d/devtools-1.1.repo $ sudo yum install devtoolset-1.1-gcc devtoolset-1.1-gcc-c+ dbus-devel
Test that the new compiler works (should be 4.7.2):
$ scl enable devtoolset-1.1 bash $ gcc -v
scl command loads support for the devtoolset software collection and launches a new bash shell. If the compiler version is listed as 4.7.2 then the environment is set up properly to compile Qt.
Download and compile Qt 4.8.4:
$ wget http://download.qt-project.org/official_releases/qt/4.8/4.8.4/qt-everywhere-opensource-src-4.8.4.tar.gz $ tar zxf qt-everywhere-opensource-src-4.8.4.tar.gz -C /tmp $ cd /tmp/qt-everywhere-opensource-src-4.8.4 $ ./configure --prefix=/export/apps/qt/4.8.4 -prefix-install -openssl -confirm-license $ gmake -j4 $ sudo gmake install $ exit
Note, I use /tmp to compile because my home directory is network mounted, and therefore is much slower than the local disk! Also, make sure to exit the shell when you’re done; we only need GCC 4.7.2 for the Qt compilation!
Install dependencies: R + libR
RStudio needs R (obviously), and specifically
libR.so. Basically, if you compiled R yourself, make sure it was compiled with shared library support, ie:
$ ./configure --enable-R-shlib
Note: Make sure you have the
cairo-devel package installed before compiling R, or else your R will use X11 fonts instead of nice, modern truetype fonts.
Install dependencies for building RStudio:
$ sudo yum install cmake libuuid-devel.x86_64 ant pango-devel java-1.7.0-openjdk-devel.x86_64 java-1.6.0-openjdk-devel.x86_64
Clone the RStudio git repository and checkout the latest stable tag:
$ git clone https://github.com/rstudio/rstudio.git $ cd rstudio $ git checkout v0.97.551
Install some more dependencies using RStudio’s built-in scripts:
$ cd dependencies/common $ ./install-dictionaries $ ./install-mathjax $ ./install-gwt $ cd ../../
FYI, there are a few other scripts in there for installing dependencies, but these were the only ones that made sense for my environment. Also, RStudio ships its own pre-compiled Qt, for example, but it is compiled to live in /opt/, which won’t work for me.
Finally, build RStudio:
$ cmake -DRSTUDIO_TARGET=Desktop -DCMAKE_BUILD_TYPE=Release -DBOOST_ROOT=/export/apps/boost/1.50.0/ -DCMAKE_INSTALL_PREFIX=/export/apps/rstudio/v0.97.551 -DQT_QMAKE_EXECUTABLE=/export/apps/qt/4.8.4/bin/qmake PATH=/export/apps/R/3.0.0/bin:$PATH $ make $ sudo make install
Note, you must take care to fix the paths to the various dependencies so cmake can find them!
Enjoy the hard work
In order to run
rstudio you’ll need to make sure the boost and Qt library paths are present in LD_LIBRARY_PATH, and that R can be found in PATH. An example wrapper script, rstudio.sh:
#!/bin/env bash export PATH=/export/apps/R/3.0.0/bin/R:$PATH export LD_LIBRARY_PATH=/export/apps/boost/1.50.0/lib:/export/apps/qt/4.8.4/lib:$LD_LIBRARY_PATH /export/apps/rstudio/v0.97.551/bin/rstudio
Here’s RStudio in all its glory, showing a plot, modern typography, and the “Solarized Dark” theme that comes with RStudio:
In the future I hope we’ll have some interesting things to share (and code!) about how we’re using R to solve problems at ILRI.