The Gateway to Algorithmic and Automated Trading

Python for analysing financial markets

Published in Automated Trader Magazine Issue 43 Q2 2017

Python is increasingly popular and is used in front-office systems at some of the largest investment banks and hedge funds. We provide an introduction to the language and its extensive range of libraries, and show how to speed up execution.

AUTHOR'S BIO

Saeed Amen

Saeed Amen is the founder of Cuemacro, which consults and publishes research for clients in systematic trading. Previously, Saeed developed systematic trading strategies at Lehman Brothers and Nomura. He is a co-founder of the think tank The Thalesians and the author of Trading Thalesians.

This article is split into three parts. Firstly, we discuss the relative merits of various programming languages for analysing financial markets. This part is especially relevant for readers less familiar with Python or coding in general. There are short explanations of how some of the more common languages operate and what is of particular importance when it comes to performance and usability.

Secondly, we go into detail about the libraries available in Python to analyse data. Our discussion covers some libraries which might be less well-known within the Python data community. We suggest that developers familiar with Python should jump to this part.

Finally, we introduce Cuemacro's open-source financial market libraries written in Python: Chartpy (visualisation), Findatapy (market data) and Finmarketpy (backtesting trading strategies). We conclude by presenting some examples of market analysis written in Python using these libraries.

Part I: Which programming language should you choose?

The most important aspects you need to consider when choosing a programming language are related to time.

One determining factor which you need to pay attention to is execution time, or the time it takes to run your analysis. Another equally important factor is development time, or the time it takes to write the actual code.

The relative importance of execution time versus development time is a key consideration when it comes to choosing an appropriate programming language. When running a high frequency trading (HFT) strategy in production, execution time is likely to be crucial. This contrasts with longer term trading strategies or prototyping, where execution time is less of a consideration.

We expand upon this idea of balancing execution and development time in the following sections, in which we discuss the relative merits of different types of programming languages for financial market applications.

Statically typed languages

In instances where a short execution time is paramount, such as in HFT, you are most likely to want use a lower level language which compiles to machine code, such as C++. Lower level languages tend to use static typing. Static typing involves specifying the type of data we want to store in variables at compilation, before runtime, which reduces the amount of processing needed at execution.

Historically, C++ has been the language of choice in quantitative finance, in particular for option pricing. However, coding in C++ is time-consuming and requires programmers to have a clear understanding of lower level concepts such as memory allocation and pointers.

One alternative to C++ is Java. Like C++, Java is a statically typed language. Unlike C++, Java does not require users to manage lower-level memory allocation and offers features such as automatic garbage collection. This means they do not need to worry about freeing up memory space once they have finished using a variable. (This, of course, does not totally eliminate the chances of a memory leak in code, which can crash the program).

Unlike C++, Java does not compile directly to machine code and is instead compiled to Java bytecode, which is executed by the Java Virtual Machine (JVM). While the bytecode is more portable than the machine code, it still needs to be translated to machine code at time of execution by the virtual machine - known as just-in-time (JIT) compilation. This introduces a startup time delay to your program.

Historically, JVMs have been slow at executing Java bytecode. In recent years, however, they have become faster. Indeed, NumFOCUS (2017) shows that for basic mathematical operations Java's execution time is now comparable with that of C++. Furthermore, owing to bytecode to machine code JIT compilation, you can execute the same Java bytecode on a number of different platforms without having to recompile the source code. This adds to the convenience of using Java: It is possible in principle to compile your code on a Mac and run it on Linux or Windows, reducing development time when using multiple operating systems.

Java is not unique for being compiled to bytecode. C#, which bears many similarities to Java in its syntax, and other languages from the .NET framework are also compiled into intermediate code (similar to Java's bytecode) which is subsequently JIT'ed into native instructions by the Common Language Runtime (CLR).

Interpreted languages

When the primary goal is to reduce development time, rather than execution time, we can turn to interpreted languages, which are very useful for scripting.

Common interpreted languages used in finance include Python, Matlab and R. They are chosen since they reduce development time when prototyping trading strategies. Execution happens through an interpreter, without the need for pre-compilation into a machine (or byte-) code executable - unlike compiled languages such as C++.

Interpreted languages are generally dynamically typed (as opposed to statically typed). This means that the types of variables are associated with their assigned values at runtime, and not specified by the programmer (or inferred by a compiler). This is one feature that makes scripting languages less verbose, making it quicker to write code. On the flip side, execution can take longer.

Whilst Matlab is primarily known for its matrix algebra capabilities, it also has many libraries known as toolboxes, which offer additional functionality ranging from signal processing to computational finance to image analysis. Matlab remains popular partially because so much legacy code in financial firms is written in it. It can also interface well with many other languages with minimal effort, including Python and Java.

In recent years Matlab has faced competition from R and Python. Both R and Python offer similar functionality to Matlab, but have the added benefit of being open-source languages. However, there is an implicit cost from transitioning from Matlab to either Python or R, notably in terms of time spent learning a new language. It also takes time to rewrite legacy Matlab code in Python and R.

R is an open-source version of the statistical package S. Historically, cutting edge statistical techniques have tended to be implemented in R before other languages. This has attracted a large following among the data science community. However, if your application is not purely based around statistics, R might not be the best choice. It is relatively slow compared to most other languages (see Boraan Aruoba & Fernández-Villaverde, 2014, and NumFOCUS, 2017) and the syntax is more suited to those with a mathematical rather than a programming background.

Julia is a more recent scripting language, which has been designed to address many of the issues associated with R and Python. (For an introduction to Julia, see this issue's "Julia - A new language for technical computing", page 37.) In particular, when Julia code is first run, it generates native machine code for execution. This contrasts with R and Python code which is executed by an interpreter. Theoretically, native machine code should be quicker than interpreted code. NumFOCUS (2017) gives a set of benchmarks that indicate the language has comparable performance with C for a number of functions such as matrix multiplication and sorting lists.

Functional and query languages

So far we have focused on imperative languages. But what about using other types of languages?

Haskell is a functional language. For programmers used to imperative programming and the idea of mainly using loops, it can be challenging to adopt a functional approach to programming. However, certain mathematical problems can be more naturally expressed in a functional framework.

Lisp is another common functional language and is often used in natural language processing. Indeed, one of the biggest companies in this area, RavenPack, actively uses Lisp. F#, Microsoft's functional language, also has the benefits of being part of the .NET Framework, so it can be called easily by other .NET framework languages such as C#. The JVM also has functional languages, such as Clojure. Scala combines object-oriented development with functional elements and also compiles into Java bytecode.

Q is a query-based language. It is primarily designed to be used with kdb+, a high performance database which can deal with large amounts of columnar style data, such as time series data. kdb+ is often used to store tick data from financial markets.

It might seem odd to consider using a database language for financial analysis. The idea is that we can do a lot of analysis within the database and then output a summary (see Bilokon & Novotny, 2018). This avoids the overhead associated with retrieving the data from a database.

Whilst the 32-bit version of kdb+ is available for free, the 64-bit version is subject to a licence fee. Another downside of Q is that it tends to be relatively complicated to get to grips with (although there is the simpler q-SQL language which, as the name suggests, has a similar syntax to SQL).

Why Python for financial analysis?

So far we have discussed the relative merits of several languages when analysing financial data. As we have noted, the language chosen largely depends on the aims of your analysis. If you want to conduct real-time analysis of tick data, you likely need to choose a high performance language like C++. However, for most other purposes, where short execution time is not the primary consideration, such as when analysing lower frequency data, there are many other choices.

Python can be viewed as a compromise language for market analysis. It has a lot of libraries, just as R and Matlab do. It is easier to learn than lower level languages like C++. An important part of any larger programming project is the ability to reuse code. This is facilitated by object-oriented coding, which tends to be easier in Python than in either R or Matlab.

Whilst Python is certainly not the fastest language for execution, it is quicker than R and by most standards comparable with Matlab (see NumFOCUS, 2017). Parallelising code, or splitting up the computation into chunks which can be solved at the same time, can cut execution time. Today, processors usually have many cores for computations, hence a processor can run multiple calculations at the same time. Notably, one drawback of Python is its global interpreter lock (GIL) which only allows one native thread to execute at any one time. As a result, the GIL can make it more challenging to parallelise code. Later in the article we discuss other techniques for reducing execution time for Python code.

Python and the financial community

A number of large financial organisations use Python and have adopted it in their core processes. On the sell side, JPMorgan's Quartz system uses Python extensively. Quartz is used for pricing trades, managing exposure and computing risk metrics across all asset classes. Athena, a similar system at BAML also extensively uses Python. Of course, this is not to say that the sell side has suddenly dumped technologies like the .NET Framework and Java, but it is a sign that Python has come of age. Many large quant hedge funds, such as AHL, have also adopted Python.

In recent years, financial firms have begun to open source some of the their code. This is likely to be helpful for the adoption of Python within the financial community. AHL has even open sourced its Arctic Python project, which is a high-performance time series data storage wrapper for MongoDB. Another library, Pandas, very popular for data analysis, originally started as a project at the investment management firm AQR.

How can we speed up Python?

Just as with R and Matlab, it is beneficial to vectorise Python code. For example, rather than using a for-loop code structure to multiply matrices, which can be slow, we can use highly optimised matrix multiplication functions instead. Admittedly, in more complicated cases it is not always trivial to vectorise code in this way.

As we discussed earlier, given that Python has the GIL, it can be more challenging to do true parallelised computation within a single process. You need to use a work-around, such as the multiprocessing library, which creates separate Python processes in memory. This approach allows you to do computation on multiple cores. Nonetheless, this also makes it more challenging to share memory between the processes. If the bottlenecks in your code are related to input/output (IO) processes (like downloading web pages), then using other approaches could potentially be better. These include the multithreading library or the asyncio library, which handle asynchronous IO requests without blocking.

Cython presents with another way to speed up Python code. Cython is a static compiler for Python, which also lets you call C functions and declare C types. Python code has dynamic typing, unlike, for example, Java, which has static typing. If you declare C types in Cython, it allows you to convert your slow Python for loops into C.

You can use Cython to wrap C++/C libraries. Cython also makes it possible to release the GIL in order to use multithreading directly in functions without the need to create separate processes. Many libraries in Python extensively use Cython. Admittedly, Cython is not a magic bullet to reduce the execution time of Python code. In some cases, it can be time consuming to rewrite Python code for Cython's compiler. This can be the case when your Python code contains more complicated syntax which cannot be converted easily into low level C code by Cython.

An alternative to Cython is Numba. It is a low level virtual machine (LLVM) that generates machine code from Python at runtime, which can also be done on a static basis. The code generated by Numba can be compiled to run either on the CPU or the graphics processing units (GPUs). GPUs are typically useful for large scale computations with repeatable operations, like matrix multiplication.

Part II: The Python data libraries

Python now boasts a lot more data libraries than it did several years ago. This has encouraged quants to use Python. In this section we discuss some of the most popular Python data libraries.

The SciPy stack

The SciPy stack comprises several popular libraries for scientific and technical computing. It includes NumPy, Pandas, IPython, Matplotlib and the SciPy library, which we discuss below in some detail.

The first step of learning Python is developing a basic understanding of the syntax. For those wishing to analyse financial markets, it is important to have an understanding of the SciPy stack. In particular, we would recommend focusing on NumPy and Pandas, given that financial market data often consists of time series data.

NumPy is at the core of the stack and offers a large number of functions to deal with matrix manipulation of 'ndarray' objects, which are n-dimensional arrays. NumPy is written in a mix of Python and C and uses the underlying BLAS and LAPACK libraries to do much of its computation quickly. These types of functions are at the source of much of the computation in financial analysis. NumPy can be viewed as the Python equivalent of Matlab's matrix functionality.

Pandas is a Python data analysis library which deals with time series. It offers functions to perform common manipulations of time series, such as aligning or sorting them. At its core are several data structures: the Series (single time series), the DataFrame (multi-column time series) and the Panel (three-dimensional time series). These data structures can be seen as Python's equivalent of R's data frames. The underlying dates and data within these data structures are stored as NumPy arrays.

IPython is an interactive notebook-based environment for Python code. We can combine Python code, text and results in a single file with IPython. It enables us to create interactive research documents, where the code and results of our output are in a single place. This contrasts with the typical alternative, such as a static PDF file.

One of the reasons for R's popularity is its ggplot library which produces high quality visualisations. Matplotlib is the most popular visualisation library for Python and it is designed to replicate much of the functionality of ggplot. Matplotlib can generate a multitude of plots, ranging from simple 2D plots to more complicated 3D plots and animations. However, some of its functionality can be challenging to use, which has led to the development of wrappers to simplify its interface. These include the libraries Seaborn and Chartpy.

The SciPy library - not to be confused with the SciPy stack - provides methods for a number of different computations used in financial analysis, including numerical integration, optimisation, interpolation, linear algebra, statistics and image processing.

Machine learning and statistics

As computing power has become cheaper and more datasets have become available, the interest in machine learning has grown significantly. In a nutshell, the idea of machine learning is to make inferences between different variables within a dataset where we do not know the underlying function or a process beforehand. Python has many libraries for machine learning; we describe a few popular ones.

Scikit-learn is perhaps the best known of the machine learning libraries for Python. It can be used for a number of tasks including classification, regression, clustering, dimensionality reduction, model selection and pre-processing. The algorithms range from linear regressions to techniques which can handle non-linear relationships, like support vector machines and k-nearest neighbours.

The deep learning library TensorFlow was released by Google in 2015. TFLearn provides a simplified interface for using TensorFlow, similar to scikit-learn. Other deep learning frameworks include Theano and PyTorch.

PyMC3 is a package for Bayesian statistical modelling and probabilistic machine learning for Python. The underlying matrix computation is done by Theano on either CPU or GPU whilst the higher level functions accessed by users are in pure Python.

QuantEcon is an econometrics library for Python and Julia, which is maintained and used in a number of academic institutions including New York University. Its functionality includes agent-based modelling.

Text and natural language processing

The ever-growing amount of content on the web has resulted in a huge amount of unstructured data. A lot of this data is text. In order to make text data usable for traders, it needs to be cleaned and structured. Furthermore, you might want to create metrics to describe text, such as sentiment scores. These can then be used to trigger trading signals. Python has many features to deal with text data and there are also a number of open-source libraries for natural language processing (NLP) and cleaning text.

The Natural Language Toolkit is the most well-known Python library for NLP and began life in 2001. It has many features to deal with and understand text information. It allows, for example, tagging text, creating parsing trees for sentences and identifying entities in text. It also comes with many existing word corpuses.

spaCy is a much newer library for natural language processing. Benchmarks quoted by Explosion AI (2015) show that the library is much faster than other similar frameworks and offers very high precision. It is used by a number of large companies such as Quora.

BeautifulSoup can be used to extract usable text from webpages in Python for offline processing. Text extraction can be a lengthy process; this library can strip away parts of webpages which are not relevant to the meaning, such as HTML tags and menus.

Market data and databases

At the heart of any financial markets analysis is market data. There are many Python libraries that help access and store market data. There are also a number of libraries which simplify the process of storing this data in databases.

MongoDB is a one of the most well-known non-SQL databases. While SQL databases store data in tables in a relational way, NoSQL databases use different data structures for storage. For example, they might store data as documents or in key-value stores. Arctic is AHL's open-sourced Python library which acts as a wrapper for MongoDB when storing time series data. It transparently compresses and decompresses Pandas' data structures locally, reducing the impact on the network.

Python has many other wrappers for accessing external databases: PyMongo for MongoDB; qPython for kdb+; and SQLAlchemy for SQL. Redis is an in-memory database, which uses a key-value style data store, similar to a dictionary- style object in Python. This allows access to in-memory data store much faster than a disk-based data store. The obvious limitation is the RAM available.

Many vendors also offer Python APIs for accessing market data. Bloomberg has an open-source Python API called Blpapi, which can be used both with the desktop and server Bloomberg products. Quandl, a popular online data provider, also offer its own Python API.

Visualisation

Once you have completed your market analysis, you probably need to present your results. Typically, this involves creating charts. Aside from Matplotlib, part of the SciPy stack as described above, there are numerous other libraries for generating charts.

Bokeh is a Python-based library which can be used to generate Javascript-based interactive plots. Plotly also creates Javascript-based interactive plots, similar to Bokeh. Additionally, it allows users to share plots with data through web browsers. It has ports for many other languages, including R and Matlab.

VisPy is a more specialised GPU-accelerated library for visualisation in Python. Whilst it is less mature than the other visualisation libraries we have discussed, its big advantage is its ability to plot complicated charts very quickly (for example those with millions of points).

Part III: Cuemacro's open-source financial libraries

Building upon a large number of open-source libraries, over the past few years at Cuemacro we developed our own Python framework for analysing market data. We had originally developed a library called PyThalesians. We later rewrote this and split it into several smaller, more specialised libraries. They were designed to provide a relatively easy-to-use, high-level interface for analysing financial markets and to allow users to focus on developing trading strategies.

For example, Chartpy is a visualisation library. It does not render charts directly, but instead allows users to render charts with a number of Python chart libraries like Matplotlib, Bokeh, Plotly and VisPy, using a consistent and simple interface. This means that users do not have to worry about the low level details of Matplotlib, Plotly and others, which are all very different. To switch between the various plotting libraries, only a single word needs to be changed in the source.

Having given an overview of Python and its data libraries, we now move to some practical code examples.

Loading FX tick data from a retail broker

In Listing 01 we show how we can load market data using the library Findatapy. Our first step is to import various dependencies. We instantiate a Market object, which can be used to fetch market data according to the parameters set in MarketDataRequest. The principle for downloading data from other data providers is the same as from Quandl: All we need to do is to change the data_source parameter. Hence, we do not need to learn the underlying APIs for each data provider, just the simple API provided by Findatapy. We then use the fetch_market method to return a Pandas DataFrame, which is later printed.

if __name__ == '__main__':
  from findatapy.market import Market, MarketDataRequest, MarketDataGenerator
  market = Market(market_data_generator=MarketDataGenerator())
  md_request = MarketDataRequest(start_date='14 Jun 2016', finish_date='15 Jun 2016', fields=['bid'], vendor_fields=['bid'], freq='tick', data_source='dukascopy', tickers=['EURUSD'], vendor_tickers=['EURUSD'])
  df = market.fetch_market(md_request)
  print(df.tail(n=10))

Listing 01: Code for downloading FX tick data

Plotting a static FX volatility surface with different volatility libraries

In Listing 02, we show how to download data for the EUR/USD FX volatility surface using Bloomberg as our data source. We take volatilities from the last date in our sample. Next, we plot the data using Matplotlib as a graphics backend (Figure 01). Finally, we do the same using Plotly. Note that we just need to change a single keyword to change the plotting engine.

from findatapy.market import Market, MarketDataRequest, MarketDataGenerator, FXVolFactory
import pandas
from datetime import timedelta

def plot_live_vol_surface():
  market = Market(market_data_generator=MarketDataGenerator())

  import datetime

  currency_pair = 'EURUSD'

  # we can also download the whole all market data for EURUSD for pricing options!
  md_request = MarketDataRequest(start_date=datetime.datetime.now().date() - timedelta(days=5), data_source='bloomberg', cut='BGN', category='fx-implied-vol', tickers=[currency_pair])

  fxvf = FXVolFactory()

  df_vs = fxvf.extract_vol_surface_for_date(df, currency_pair, -1)

  from chartpy import Chart, Style
  style = Style(title=currency_pair + " vol surface", source="chartpy", color='Blues')
  style.file_output = 'volsurface_live.png'

  # Chart object is initialised with the dataframe and our chart style
  chart = Chart(df=df_vs, chart_type='surface', style=style)
  chart.plot(engine='matplotlib')
  chart.plot(engine='plotly')

if __name__ == '__main__':
  plot_live_vol_surface()

Listing 02: Code for downloading FX volatility surface and plotting

Figure 01

Figure 01: Static volatility surface for EUR/USD

Animating an FX volatility surface

As well as static volatility surfaces, we can also animate volatility surfaces over time. Using animation in this way can be particularly useful for understanding the changing dynamics of the volatility surface, which can be difficult to see when simply plotting certain segments of the surface (for example, a time series of ATM implied volatility).

Here, we again download market data using Findatapy.

Listing 03 shows FX implied volatility surfaces for GBP/USD from Bloomberg. We extract the volatility surface for each date. It is then animated using Chartpy's Chart object by changing a single flag.

from findatapy.market import Market, MarketDataRequest, MarketDataGenerator, FXVolFactory
from chartpy import Chart, Style

def plot_animated_vol_market():
  market = Market(market_data_generator=MarketDataGenerator())

  cross = ['GBPUSD']; start_date = '01 Jun 2016'; finish_date = '01 Aug 2016'; sampling = 'no'

  md_request = MarketDataRequest(start_date=start_date, finish_date=finish_date, data_source='bloomberg', cut='LDN', category='fx-implied-vol', tickers=cross, cache_algo='internet_load_return')

  df = market.fetch_market(md_request)
  if sampling != 'no': df = df.resample(sampling).mean()
  fxvf = FXVolFactory()
  df_vs = []

  # grab the vol surface for each date and create a dataframe for each date (could have used a panel)
  for i in range(0, len(df.index)): df_vs.append(fxvf.extract_vol_surface_for_date(df, cross[0], i))

  style = Style(title="FX vol surface of " + cross[0], source="chartpy", color='Blues', animate_figure=True, animate_titles=df.index, animate_frame_ms=500, normalize_colormap=False)

  # Chart object is initialised with the dataframe and our chart style
  Chart(df=df_vs, chart_type='surface', style=style).plot(engine='matplotlib')

if __name__ == '__main__':
  plot_animated_vol_market()

Listing 03: Code for downloading FX volatility surface over a month and animated plot

FX trend following model

The following example shows how to do a backtest of a simple FX trading strategy using a trend following rule. In Listing 04 the TradingModel abstract class defines a trading strategy. We implement this by extending the class to TradingModelFXTrend_Example.

First, let us define the imported modules (Listing 04-A). The modules from Findatapy are used to fetch the data. We then use the modules from Finmarketpy to define the trading strategy. In the init method we define the name of the strategy and the plotting engine we wish to use.

import datetime

from findatapy.market import Market, MarketDataGenerator, MarketDataRequest
from finmarketpy.backtest import TradingModel, BacktestRequest
from finmarketpy.economics import TechIndicator

class TradingModelFXTrend_Example(TradingModel):

  def __init__(self):
    super(TradingModel, self).__init__()

  ##### FILL IN WITH YOUR OWN PARAMETERS FOR display, dumping, TSF etc.
    self.market = Market(market_data_generator=MarketDataGenerator())
    self.DUMP_PATH = ''
    self.FINAL_STRATEGY = 'FX trend'
    self.SCALE_FACTOR = 1
    self.DEFAULT_PLOT_ENGINE = 'matplotlib'

    self.br = self.load_parameters()
    return

Listing 04-A: Code for trend-following model

Next, we define the parameters for a backtest (Listing 04-B), including the start and end dates. We set the parameters so that volatility weighting is applied to each asset. The idea is that leverage for each asset is adjusted for our volatility target of 10%. Leverage is also adjusted at the portfolio level.

  ###### Parameters and signal generations (need to be customised for every model)
  def load_parameters(self):

    ##### FILL IN WITH YOUR OWN BACKTESTING PARAMETERS
    br = BacktestRequest()

    # get all asset data
    br.start_date = "04 Jan 1989"
    br.finish_date = datetime.datetime.utcnow().date()
    br.spot_tc_bp = 0.5
    br.ann_factor = 252

    br.plot_start = "01 Apr 2015"
    br.calc_stats = True
    br.write_csv = False
    br.plot_interim = True
    br.include_benchmark = True

    # have vol target for each signal
    br.signal_vol_adjust = True
    br.signal_vol_target = 0.1
    br.signal_vol_max_leverage = 5
    br.signal_vol_periods = 20
    br.signal_vol_obs_in_year = 252
    br.signal_vol_rebalance_freq = 'BM'
    br.signal_vol_resample_freq = None

    # have vol target for portfolio
    br.portfolio_vol_adjust = True
    br.portfolio_vol_target = 0.1
    br.portfolio_vol_max_leverage = 5
    br.portfolio_vol_periods = 20
    br.portfolio_vol_obs_in_year = 252
    br.portfolio_vol_rebalance_freq = 'BM'
    br.portfolio_vol_resample_freq = None

    # tech params
    br.tech_params.sma_period = 200

    return br

Listing 04-B: Code for trend-following model (cont.)

We load the market data, spot FX time series from Quandl, which we want to use in our analysis. In practice, it would be more appropriate to use total return indices for FX, given they include carry. However, this data is not available on Quandl. For users who have access to Bloomberg, we provide a link to a trend following model in Table 02 which uses Bloomberg's FX total return indices. Using spot as opposed to total return indices does not generally make a huge difference when computing returns for a strategy like trend following in developed market currency pairs. Here, the strategy is typically not persistently long or short. In this instance the risk adjusted returns are just over 0.1 higher when using the total returns data compared with spot data.

  def load_assets(self):
    ##### FILL IN WITH YOUR ASSET DATA

    # for FX basket
    full_bkt    = ['EURUSD', 'USDJPY', 'GBPUSD', 'AUDUSD', 'USDCAD', 'NZDUSD', 'USDCHF', 'USDNOK', 'USDSEK']

    basket_dict = {}

    for i in range(0, len(full_bkt)):
      basket_dict[full_bkt[i]] = [full_bkt[i]]

    basket_dict['FX trend'] = full_bkt

    br = self.load_parameters()

    self.logger.info("Loading asset data...")

    vendor_tickers = ['FRED/DEXUSEU', 'FRED/DEXJPUS', 'FRED/DEXUSUK', 'FRED/DEXUSAL', 'FRED/DEXCAUS', 'FRED/DEXUSNZ', 'FRED/DEXSZUS', 'FRED/DEXNOUS', 'FRED/DEXSDUS']

    market_data_request = MarketDataRequest(
            start_date = br.start_date,                     # start date
            finish_date = br.finish_date,                   # finish date
            freq = 'daily',                                 # daily data
            data_source = 'quandl',                         # use Quandl as data source
            tickers = full_bkt,                             # ticker (findatapy)
            fields = ['close'],                             # which fields to download
            vendor_tickers = vendor_tickers,                # ticker (Quandl)
            vendor_fields = ['close'],                      # which Bloomberg fields to download
            cache_algo = 'internet_load_return')            # how to return data

    asset_df = self.market.fetch_market(market_data_request)

    # signalling variables
    spot_df = asset_df
    spot_df2 = None

    # asset_df

    return asset_df, spot_df, spot_df2, basket_dict

Listing 04-C: Code for trend-following model (cont.)

We note that there is a significant difference for FX carry strategies if we compute returns with spot instead of total returns data. By construction, the model will be persistently long high yielding currencies whilst being short low yielding currencies (see Amen, 2013).

We compute the trading signal using a moving average (MA). First, we calculate a 200-day MA. If the closing spot price is above the 200-day MA, a buy signal is generated. If the closing spot price is below the MA, it results in a sell signal (Listing 04-D).

  def construct_signal(self, spot_df, spot_df2, tech_params, br):

    ##### FILL IN WITH YOUR OWN SIGNALS

    # use technical indicator to create signals
    # (we could obviously create whatever function we wanted for generating the signal dataframe)
    tech_ind = TechIndicator()
    tech_ind.create_tech_ind(spot_df, 'SMA', tech_params)
    signal_df = tech_ind.get_signal()

    return signal_df

Listing 04-D: Code for trend-following model (cont.)

We define a simple benchmark of long EUR/USD (Listing 04-E). In practice, for FX there is no obvious benchmark. Typically, we could use a benchmark of FX funds returns or a mix of generic FX carry and trend strategies to act as a proxy for beta in FX markets.

  def construct_strategy_benchmark(self):

    ###### FILL IN WITH YOUR OWN BENCHMARK

    tsr_indices = MarketDataRequest(
      start_date = self.br.start_date,                # start date
      finish_date = self.br.finish_date,              # finish date
      freq = 'daily',                                 # intraday data
      data_source = 'quandl',                         # use Bloomberg as data source
      tickers = ["EURUSD"],                           # tickers to download
      vendor_tickers=['FRED/DEXUSEU'],
      fields = ['close'],                             # which fields to download
      vendor_fields = ['close'],
      cache_algo = 'cache_algo_return')               # how to return data

    df = self.market.fetch_market(tsr_indices)

    df.columns = [x.split(".")[0] for x in df.columns]

    return df

Listing 04-E: Code for trend-following model (cont.)

We can kick off the computation by instantiating our trading object and then constructing the strategy. Next, we can plot the P/L of the strategy and its subcomponents (Listing 04-F).

if __name__ == '__main__':
  # create a FX trend strategy then chart the returns, leverage over time

  model = TradingModelFXTrend_Example()

  model.construct_strategy()

  model.plot_strategy_pnl()                        # plot the final strategy
  model.plot_strategy_leverage()                   # plot the leverage of the portfolio
  model.plot_strategy_group_pnl_trades()           # plot the individual trade P&Ls
  model.plot_strategy_group_benchmark_pnl()        # plot all the cumulative P&Ls of each component
  model.plot_strategy_group_benchmark_pnl_ir()     # plot all the IR of individual components
  model.plot_strategy_group_leverage()             # plot all the individual leverages

Listing 04-F: Code for trend-following model (cont.)

We can also present a summary of our returns using the TradingAnalysis object, which will create a single webpage of various return statistics (Listing 04-G).

    from finmarketpy.backtest import TradeAnalysis

    ta = TradeAnalysis()

    # create statistics for the model returns using finmarketpy
    ta.run_strategy_returns_stats(model, engine='finmarketpy')

Listing 04-G: Code for trend-following model (cont.)

Using Redis to speed up data loading

Each of our earlier examples involved downloading market data. We again download data, this time from Google Finance. The second time we download data, we change the cache_algo flag to cache_algo_return. This allows our application to check an internal Redis-based cache. Whilst it typically takes a few seconds to fetch data directly from Google Finance, it only takes several milliseconds to fetch the dataset from Redis.

if __name__ == '__main__':
  from findatapy.market import Market, MarketDataRequest, MarketDataGenerator
  from findatapy.util import LoggerManager

  market = Market(market_data_generator=MarketDataGenerator())
  logger = LoggerManager().getLogger(__name__)

  # in the config file, we can use keywords 'open', 'high', 'low', 'close' and 'volume' for Google finance data

  # download equities data from Google
  md_request = MarketDataRequest(
    start_date="01 Jan 2002",       # start date
    finish_date="05 Feb 2017",       # finish date
    data_source='google',           # use Google Finance as data source
    tickers=['Apple', 'Citigroup', 'Microsoft', 'Oracle', 'IBM', 'Walmart', 'Amazon',
'UPS', 'Exxon'],       # ticker (findatapy)
    fields=['close'],                # which fields to download
    vendor_tickers=['aapl', 'c', 'msft', 'orcl', 'ibm', 'wmt', 'amzn',
'ups', 'xom'],  # ticker (Google)
    vendor_fields=['Close'],         # which Google Finance fields to download
    cache_algo='internet_load_return')

  logger.info("Load data from Google directly")
  df = market.fetch_market(md_request)

  logger.info("Loaded data from Google directly, now try reading from Redis in-memory cache")
  md_request.cache_algo = 'cache_algo_return' # change flag to cache algo so won't attempt to download via web

  df = market.fetch_market(md_request)

  logger.info("Read from Redis cache.. that was a lot quicker!")

Listing 05: Code for downloading market data and reloading it with caching

Summary

We have discussed the relative merits of using various programming languages for analysing financial markets. We acknowledged that considerations about execution time versus development time can affect our choice of language. For high frequency traders, where short execution time is important, lower level languages are preferable. For lower frequency strategies, where execution time is less important, scripting languages might be better.

We discussed the merits of using Python, which can be viewed as a language of compromise that brings together many of the advantages of other languages. We also noted that whilst Python might be slower than languages such as C++, it is still faster than, for example, R. We also discussed techniques that could be used to speed up Python code, including the most common method, vectorising code. We also talked about Cython, which enables users to write Python-like code which compiles to machine code, and Numba, which translates Python into code that can be run by a low level virtual machine.

There are many large open-source libraries for data analysis in Python. These have encouraged the widespread adoption of Python by market participants. The most notable libraries in this space include NumPy for matrix algebra and Pandas for manipulating time series data.

Finally, we introduced Cuemacro's open-source libraries, Chartpy, Findatapy and Finmarketpy. We gave examples of how they can be used to create charts with a minimal amount of code and how they can transparently load market data from many sources with a consistent interface. We presented Python code for backtesting a simple trend following trading strategy.

The Python code samples described here are also available at the GitHub repositories referenced in Table 01, along with further examples. Links to the various Python libraries discussed are provided in Table 02.

Title

Author

Released

Source

Beta'em up: What is market beta in FX?

Amen, S.

2013

ssrn.com/abstract=2439854

A comparison of programming languages in economics

Boraan Aruoba, S. & Fernández-Villaverde, J.

2014

nber.org/papers/w20263

Kdb+ for electronic trading: Q, high frequency financial data and algorithmic trading

Bilokon, P.A. & Novotny, J.

2018
(planned for)

Wiley Finance Series, John Wiley & Sons

Julia benchmark times relative to C

NumFOCUS

2017

julialang.org/benchmarks/

spaCy Facts and Figures

Explosion AI

2015

spacy.io/docs/api/

Further reading

Category

Library

Description

URL

Cuemacro libraries
(contains code examples from this article)

Chartpy

Chart library

github.com/cuemacro/chartpy

Finmarketpy

Backtesting library

github.com/cuemacro/finmarketpy

Findatapy

Market data library

github.com/cuemacro/findatapy

The SciPy stack - scipy.org/

NumPy

Matrix algebra

numpy.org

SciPy libary

Scientific computing

scipy.org/scipylib/index.html

Matplotlib

Visualisation

matplotlib.org

IPython

Interactive notebook

ipython.org

Sympy

Symbolic mathematics

sympy.org

Pandas

Time series

pandas.pydata.org

Machine learning and statistics libraries

scikit-learn

Machine learning

scikit-learn.org

TensorFlow

Machine learning

tensorflow.org

TFLearn

Wrapper for TensorFlow

tflearn.org

Theano

Multi-dimensional array computation

deeplearning.net/software/theano

PyTorch

Deep learning framework

pytorch.org

PyMC3

Bayesian modelling and probabilistic machine learning

github.com/pymc-devs/pymc3

QuantEcon

Quantitative economics

quantecon.org

Text and natural language processing libraries

NLTK

Natural language processing

nltk.org

SpaCy

Natural language processing

spacy.io

BeautifulSoup

Web scraping

crummy.com/software/BeautifulSoup

Visualisation libraries

Bokeh

Interactive JavaScript based visualisation

bokeh.pydata.org/en/latest

Plotly

Interactive JavaScript based visualisation

plot.ly

VisPy

GPU accelerated visualisation

vispy.org

Market data and database libraries

PyMongo

Python based MongoDB wrapper

github.com/mongodb/mongo-python-driver

qPython

KDB database wrapper

github.com/exxeleron/qPython

Redis

Key/value store wrapper

github.com/andymccurdy/redis-py

SQLAlchemy

High level SQL access

sqlalchemy.org

Blpapi

Bloomberg Python API

bloomberg.com/professional/support/api-library

Quandl

Quandl Python API

quandl.com/tools/python

Table 01: Python libraries