Supercomputers are specialized computers that can perform calculations thousands of times faster than a normal computer like a laptop. An exascale machine can do 1018 (a billion billion) calculations per second. The first exascale machines are expected to become available sometime around 2020. The ExCAPE project is part of the EC research on how to design and use these computers to do better science.

The ExCAPE project focusses on an example of the uses of such machines. We will design and write computer programs to help people who look for new medicines. The programs will be written in such a way that they can benefit from the very fast computers that will be available in roughly five years’ time. Part of the job of finding new medicines is looking for chemicals that affect how a disease works, to kill it or stop the disease getting worse. There are a huge number of chemicals, and it is impossible to test them all. This is why scientists use programs to predict which ones will have an effect on some part of the disease, so they can narrow down the list of chemicals that they have to do real tests with.

The prediction problem can be likened to film recommendation. Imagine a large list of people (e.g. the population of Europe) and a smaller, but still quite large lists of films (a few tens of thousands). It is not feasible to get everyone to watch every film. However, if we collect some real ratings data (e.g. between 1 and 5 stars) given by people that have actually watched various films, we can try to predict what the top 100 films are for all the people on the list. We can usually do a good job, even without that many ratings per person. This is a bit like finding the top 100 diseases that a given chemical might work on, or vice versa the top 100 chemical for a given disease. Chemicals and diseases are more complicated than film ratings though, so the techniques need to be adapted and improved.

The ExCAPE project is roughly divided into three parts.

  1. The first part is about the design of algorithms (the recipes for making good predictions),
  2. the second is about writing the programs to make sure they can run well on a supercomputer, and
  3. the third is about making sure the data is usable, and running the programs on the data to see how well they do. Together they will show how to use supercomputers to give better predictions for medicine research.