Cloud, Python & Co.: next-level research at Quoniam
Setting up efficient research in the cloud, migrating tens of thousands of lines of code to Python – and doing it all largely from home? Our research team has made it possible. An interview with Dr Volker Flögel, Head of Research, and Andre Fröhlich, Head of Research Technology.
Why are you revamping Quoniam’s research?
Volker: To create even more added value for our clients. Fast, efficient research has a direct impact on the strength and diversity of our investment strategies. In order for us to be able to process huge amounts of data quickly in the future, a certain infrastructure is necessary.
And that was no longer a given?
Andre: We come from a world where research was carried out directly on the workstation computer. With the programming languages SAS, R and T-SQL. We have been working with this technology stack since our founding over 20 years ago. At some point, a mountain of scripts and small programs builds up and you have to ask yourself the question of whether to keep doing what you’re doing or tackle important technology shifts. We’ve chosen the latter.
What issues did you specifically address?
Volker: After an extensive analysis, we decided on Python – a future-proof programming language that is now being used stringently. In order to not only be consistent, but also fast, we are gradually moving our research to the cloud. Python and cloud conversion are therefore the most important points that we are tackling in parallel.
‘Fast, efficient research has a direct impact on the strength and breadth of our investment strategies.’
Dr Volker Flögel
Head of Research
How do you decide which programming language to work with for the next few years? Isn’t that very complicated?
Andre: Definitely. We have evaluated which languages are used by important companies in the data-science and machine-learning sector, which have grown the most and which are used the most. Python emerged as the leading programming language. It is state of the art worldwide – and that is why we are also switching to it at Quoniam.
Converting 20 years of coding work to Python sounds like a mammoth project.
Volker: Yes! 850 SAS scripts created since 1999 are being migrated. The ten longest scripts contain about 8,000 lines of code. This can only be implemented with a super team.
SAS scripts are converted to Python
lines of code, the ten longest scripts alone > 8,000 lines
of data are migrated
database tables distributed over 38 databases are migrated to the data lake
Why is research going to the cloud now?
Andre: Modern machine-learning (ML) and artificial-intelligence (AI) methods are very computationally intensive. If I compute an ML model on a single workstation PC, it sometimes takes forever and requires a lot of RAM. One example is the simultaneous report analysis of hundreds of listed companies, where natural language processing (NLP) is used to convert the unsystematic information from the texts into measurable factors.
Volker: For such large jobs, you can either buy large servers and put them in your own data centre or go to the cloud. Own servers only make sense if the high load is continuous and plannable, i.e. we permanently call up the computing power. Renting computing power in the cloud is more demand-driven. In research, the utilisation of computing power varies greatly and results should be available as quickly as possible – this leads to large load peaks. However, there are also many tasks where only a little computing power is needed. For this reason, the cloud makes more sense for us. In addition, we have access to various helpful tools and services via the cloud that are constantly updated and would not be available on site.
‘Since spring 2021, we have been conducting realistic, value-added research projects in the cloud.’
Head of Research Technology
How did the cloud conversion project go?
Andre: Before we started the actual conversion, we did a detailed preliminary study. As a financial services provider, there are many regulatory and legal requirements that we have to comply with in connection with the cloud. We decided to go with Microsoft Azure Cloud. After the preliminary study, the transition started. Since the beginning of Q2 2021, we have been conducting realistic, value-added research projects in the cloud.
What does the infrastructure behind it look like?
Andre: Our new research platform is based on the open-source technology Kubernetes. This is the basis in which you can deploy all kinds of applications, such as Jupyterhub. With the Kubernetes cluster, you can spin up new virtual machines as needed, and the system will shut them down again if they are no longer needed. This saves resources and allows, among other things, distributed computing on hundreds of virtual machines, where the results are aggregated again afterwards. We have also developed several new Python libraries that offer researchers significantly more flexibility than the old SAS scripts.
‘Our new Python library for calculating heuristic back tests makes it much easier for me to test new investment strategies. One positive aspect is the high speed with which the back tests are calculated. But the main advantage is the flexibility of the library. We can also use it to analyse complex strategies efficiently. For example, strategies in which stocks are only bought on certain events, such as positive earnings announcements.’
Associate Director, Research Forecasts
So where do we go from here?
Volker: Since the beginning of 2021, we have been implementing the first two forecast models in Python. The migration of further models will take place in the second half of the year. Subsequently migrating these models to the cloud is another big topic. As far as the cloud is concerned, we are initially focusing on the computationally and storage-intensive research process before gradually migrating the entire research platform to the cloud. In parallel, there is a focus on further developing our existing analytics capabilities in the cloud.
What was special about the project?
Andre: The great cross-team collaboration, even though we have largely been working from home since March 2020. Not only researchers, but also representatives from Governance, IT, Application Development, Portfolio Management and Implementation were involved in the project. In addition, various external service providers supported us. It is only through the commitment of each individual that we have made such good progress!