Scientific Publications

 Tom Vander Aa and Tom Ashby. SMURFF: a High-Performance Framework for Matrix Factorization. [ bib | http ]

 Xiangju Qin, Paul Blomstedt, and Samuel Kaski. Large-scale probabilistic non-linear matrix factorization for drug discovery. Technical report. [ bib | arXiv | .pdf ]

 V Vovk. Purely pathwise probability-free Ito integral. Matematychni Studii, 46(1), mar 2016. [ bib | DOI | http ]

 Vladimir Vovk, Jieli Shen, Valery Manokhin, and Min-ge Xie. Nonparametric predictive distributions based on conformal prediction. In Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, editors, Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications, volume 60 of Proceedings of Machine Learning Research, pages 82--102, Stockholm, Sweden, nov 2017. PMLR. [ bib | .html ]

 Vladimir Vovk, Ilia Nouretdinov, Valentina Fedorova, Ivan Petej, and Alex Gammerman. Criteria of efficiency for set-valued classification. Annals of Mathematics and Artificial Intelligence, 81(1-2):21--46, mar 2017. [ bib | DOI | http ]

 Denis Volkhonskiy, Evgeny Burnaev, Ilia Nouretdinov, Alexander Gammerman, and Vladimir Vovk. Inductive Conformal Martingales for Change-Point Detection. In Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, editors, Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications, volume 60 of Proceedings of Machine Learning Research, pages 132--153, Stockholm, Sweden, nov 2017. PMLR. [ bib | .html ]

 Tom Vander Aa, Imen Chakroun, and Tom Haber. Distributed Bayesian Probabilistic Matrix Factorization. Procedia Computer Science, 108:1030--1039, 2017. [ bib | DOI | http ]

 Tom Vander Aa, Tom Ashby, Yves Vandriessche, Vojtech Cima, Stanislav Böhm, and Jan Martinovic. Machine Learning for Chemogenomics on HPC in the ExCAPE Project. pages 72--74, jun 2017. [ bib | http ]

 Paolo Toccaceli, Ilia Nouretdinov, and Alexander Gammerman. Conformal prediction of biological activity of chemical compounds. Annals of Mathematics and Artificial Intelligence, 81(1-2):105--123, jun 2017. [ bib | DOI | http ]

 Paolo Toccaceli and Alexander Gammerman. Combination of Conformal Predictors for Classification. In Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, editors, Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications, volume 60 of Proceedings of Machine Learning Research, pages 39--61, Stockholm, Sweden, nov 2017. PMLR. [ bib | .html ]

 Jiangming Sun, Nina Jeliazkova, Vladimir Chupakhin, Jose-Felipe Golib-Dzib, Ola Engkvist, Lars Carlsson, Jörg Wegner, Hugo Ceulemans, Ivan Georgiev, Vedrin Jeliazkov, and Et al. ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics. Journal of Cheminformatics, 9(1), mar 2017. [ bib | DOI | http ]

 Xiangju Qin, Paul Blomstedt, Eemeli Leppäaho, Pekka Parviainen, and Samuel Kaski. Distributed Bayesian Matrix Factorization with Limited Communication. mar 2017. [ bib | arXiv | http ]

 Gundula Povysil, Antigoni Tzika, Julia Vogt, Verena Haunschmid, Ludwine Messiaen, Johannes Zschocke, Günter Klambauer, Sepp Hochreiter, and Katharina Wimmer. panelcn.MOPS: Copy-number detection in targeted NGS panel data for clinical diagnostics. Human Mutation, 38(7):889--897, may 2017. [ bib | DOI | http ]

 Ilia Nouretdinov. Validity and efficiency of conformal anomaly detection on big distributed data. Advances in Science, Technology and Engineering Systems Journal, 2(3):254--267, may 2017. [ bib | DOI | http ]

 Ilia Nouretdinov. Reverse Conformal Approach for On-line Experimental Design. In Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, editors, Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications, volume 60 of Proceedings of Machine Learning Research, pages 185--192, Stockholm, Sweden, nov 2017. PMLR. [ bib | .html ]

 Ilia Nouretdinov. Improving Reliable Probabilistic Prediction by Using Additional Knowledge. In Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, editors, Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications, volume 60 of Proceedings of Machine Learning Research, pages 193--200, Stockholm, Sweden, nov 2017. PMLR. [ bib | .html ]

 Balazs Nemeth, Tom Haber, Thomas J Ashby, and Wim Lamotte. Improving Operational Intensity in Data Bound Markov Chain Monte Carlo. Procedia Computer Science, 108:2348--2352, 2017. [ bib | DOI | http ]

 Eemeli Leppäaho, Muhammad Ammad-ud din, and Samuel Kaski. GFA: Exploratory Analysis of Multiple Data Sources with Group Factor Analysis. Journal of Machine Learning Research, 18(39):1--5, 2017. [ bib | .html ]

 Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self-Normalizing Neural Networks. jun 2017. [ bib | arXiv | http ]

 Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self-Normalizing Neural Networks, 2017. [ bib | http ]

 Vojtech Cima, Stanislav Böhm, Jan Martinovič, Jiří Dvorský, Thomas J Ashby, and Vladimir Chupakhin. HyperLoom Possibilities for Executing Scientific Workflows on the Cloud. Complex, Intelligent, and Software Intensive Systems, pages 397--406, jul 2017. [ bib | DOI | http ]

 Imen Chakroun, Tom Haber, and Thomas J Ashby. SW-SGD: The Sliding Window Stochastic Gradient Descent Algorithm. Procedia Computer Science, 108:2318--2322, 2017. [ bib | DOI | http ]

 Vladimir Vovk, Jieli Shen, Valery Manokhin, and Min-ge Xie. Nonparametric predictive distributions based on conformal prediction. Machine Learning, aug 2018. [ bib | DOI | http ]

 Vladimir Vovk and Claus Bendtsen. Conformal predictive decision making, jun 2018. [ bib | .html ]

 Paolo Toccaceli and Alexander Gammerman. Combination of inductive mondrian conformal predictors. Machine Learning, aug 2018. [ bib | DOI | http ]

 Noé Sturm, Jiangming Sun, Yves Vandriessche, Andreas Mayr, Günter Klambauer, Lars-Anders Carlson, Ola Engkvist, and Hongming Chen. Application of Bioactivity Profile Based Fingerprints for Building Machine Learning Models. aug 2018. [ bib | DOI | http ]

 Alberto Scionti, Somnath Mazumdar, and Antoni Portero. Towards a Scalable Software Defined Network-on-Chip for Next Generation Cloud. Sensors, 18(7):2330, jul 2018. [ bib | DOI | http ]

 Ilia Nouretdinov, Denis Volkhonskiy, Pitt Lim, Paolo Toccaceli, and Alexander Gammerman. Inductive Venn-Abers predictive distribution, jun 2018. [ bib | .html ]

 Andreas Mayr, Günter Klambauer, Thomas Unterthiner, Marvin Steijaert, Jörg K Wegner, Hugo Ceulemans, Djork-Arné Clevert, and Sepp Hochreiter. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chemical Science, 9(24):5441--5451, 2018. [ bib | DOI | http ]

 Nikolay Kochev, Svetlana Avramova, and Nina Jeliazkova. Ambit-SMIRKS: a software module for reaction representation, reaction search and structure transformation. Journal of Cheminformatics, 10(1), aug 2018. [ bib | DOI | http ]

 Vojtěch Cima, Stanislav Böhm, Jan Martinovič, Jiří Dvorský, Kateřina Janurová, Tom Vander Aa, Thomas J Ashby, and Vladimir Chupakhin. HyperLoom. Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms - PARMA-DITAM '18, 2018. [ bib | DOI | http ]

IMEC: Imec is providing expertise in software development for extreme scale systems, and the relationship between new hardware models and user programming models

 


Janssen Cilag SA:
 Janssen are providing expertise the in the domain of chemogenomics, knowledge on how to combine data sets, and the ability to test techniques on industry scale data

 


IT4 Innovations National Supercomputing Center
: IT4I provide expertise on software development, HPC infrastructure and the implications for operational supercomputer workloads, as well as expertise in machine learning

 


AstraZeneca AB
:
 AZ provide domain expertise in chemogenomics, and will lead the effort to provide feedback on prediction accuracy on industrial data

 


Johannes Kepler Universität Linz
:
 U.Linz provide machine learning expertise, especially in the theory and practice of deep neural networks

 


Aalto-yliopisto
:
 U.Aalto provide machine learning expertise, concentrating on probabilistic matrix factorisation for dyadic data sets

 


Intel Belgium
:
 Intel will provide insight into the coming generations of HPC chips and how best to map software to them, by providing simulation services and mapping expertise

idea
IDEAconsult: Idea will provide broad domain expertise, and experience in the preparation of data sets for learning exercises

 


Royal Holloway, University of London
:
 RHUL are experts in providing confidence metrics to add to various machine learning techniques to make them more usable by industrial experts

Supercomputers are specialized computers that can perform calculations thousands of times faster than a normal computer like a laptop. An exascale machine can do 1018 (a billion billion) calculations per second. The first exascale machines are expected to become available sometime around 2020. The ExCAPE project is part of the EC research on how to design and use these computers to do better science.


The ExCAPE project focusses on an example of the uses of such machines. We will design and write computer programs to help people who look for new medicines. The programs will be written in such a way that they can benefit from the very fast computers that will be available in roughly five years’ time. Part of the job of finding new medicines is looking for chemicals that affect how a disease works, to kill it or stop the disease getting worse. There are a huge number of chemicals, and it is impossible to test them all. This is why scientists use programs to predict which ones will have an effect on some part of the disease, so they can narrow down the list of chemicals that they have to do real tests with.


The prediction problem can be likened to film recommendation. Imagine a large list of people (e.g. the population of Europe) and a smaller, but still quite large lists of films (a few tens of thousands). It is not feasible to get everyone to watch every film. However, if we collect some real ratings data (e.g. between 1 and 5 stars) given by people that have actually watched various films, we can try to predict what the top 100 films are for all the people on the list. We can usually do a good job, even without that many ratings per person. This is a bit like finding the top 100 diseases that a given chemical might work on, or vice versa the top 100 chemical for a given disease. Chemicals and diseases are more complicated than film ratings though, so the techniques need to be adapted and improved.


The ExCAPE project is roughly divided into three parts.

  1. The first part is about the design of algorithms (the recipes for making good predictions),
  2. the second is about writing the programs to make sure they can run well on a supercomputer, and
  3. the third is about making sure the data is usable, and running the programs on the data to see how well they do. Together they will show how to use supercomputers to give better predictions for medicine research.