In the previous post, a Python implementation of a symmetric matrix was described. This implementation saves approximately 50% of memory space. However, when compared with a matrix created via the
numpy module, the average access time for our implementation was much worse than the average access time for the
numpy matrix. In this post, we will speed up the access time by using Cython.
Motivation and Goal
As mentioned in the introduction, the main driving force to reimplement our symmetric matrix lies in its slow access time. In the following table, the average access times for writing to all elements of a matrix are shown for both matrix types. All input conditions, matrix size and processor, were the same.
|Matrix Type||Access Time|
|Symmetric Matrix (via
Hence, the main goal is to speed up the existing implementation while keeping the current usage and memory requirements untouched.
There are several tools and techniques that can be used to achieve this goal. Some of them are:
- Wrapping the C/C++ implementation using the Swig wrapper generator.
- Defining a new type in Python using Python C API.
- Using Cython.
Swig is one of the oldest and very powerful tool for building extension modules. The best start scenario for using Swig is having a large existing base of C or C++ source codes that are needed to be accessed from Python. In the current situation, a C or C++ implementation of a symmetric matrix needs to be provided in order to use Swig. Since a symmetric matrix would be implemented in C or C++, it is highly possible that speed would improve. However, this approach is not very straightforward because you would need to write the C/C++ version first.
Python C API provides a way to define new types in Python. Originally, I wanted to take this direction because I believed that it will deliver the most optimized implementation. However, I came across Cython and decided to give it a try. And I love it from the day one.
So, what is Cython? It is an extended Python language and Python compiler at the same time. The main advantage is that the source code written in Cython looks like Python and Cython is able to automatically translate it into C. Since Cython generates very efficient C code, you get C speed from inside Python. In comparison with using bare Python C API:
- you do not have to write C,
- you do not need to start from scratch if you already have a Python implementation – you edit the source code you already have,
- the resulting source code is much shorter than its C equivalent – therefore, it is also easier to maintain,
- writing code in Cython is less time-consuming and less error-prone.
Stefan Behnel, a long-time Cython core developer, told us in his lecture about writing C extensions using bare Python C API the following:
You can do that but it is unlikely that:
a) it is actually going to work
b) it is going to be correct
c) it is going to be as fast as what Cython generates anyway.
Okay? So, do not do that.
All right. Lets start with the implementation.
Firstly, we change a little bit the definition of the
SymmetricMatrix class according to the Cython requirements for creating an extension type by adding the
cdef statement before
cdef class SymmetricMatrix:
Then, we specify types of instance attributes outside the
__init__() method as is required by Cython:
cdef: int _size int _data_size object _data
Since the original interface should stay the same, the signature of
__init__() is unchanged. The rest stays same as in the Python implementation:
def __init__(self, size, create_storage=None): if size <= 0: raise ValueError('size has to be positive') self._data_size = (size + 1) * size // 2 if create_storage is None: create_storage = lambda n: [0 for i in range(n)] self._size = size self._data = create_storage(self._data_size)
The implementation of the
__len__() method stays completely same as in the Python version:
def __len__(self): return self._size
A few changes can be seen in the implementation of the
__setitem__() method. More specifically, type information and bounds checking were added:
def __setitem__(self, position, value): cdef int index index = self._get_index(position) if index < 0 or index >= self._data_size: raise IndexError('out of bounds') self._data[index] = value
The reason to insert bounds checking here instead of in the
_get_index() method is explained later. The
__getitem__() method can be written analogically to
def __getitem__(self, position): cdef int index index = self._get_index(position) if index < 0 or index >= self._data_size: raise IndexError('out of bounds') return self._data[index]
The implementation of the
_get_index() method uses also explicit type information for local variables. And on top of that, the
cdef statement is added before the definition of this method. Also, a return type for the method is specified. The
cdef statement is used to define C functions. These functions can be called within a Cython module but they cannot be called from outside of the module by interpreted Python code. Since
_get_index() is an internal method called only inside the class, this limitation does not cause any harm. However, bounds checking and raising the
IndexError exception need to be put after every call of
_get_index() to work properly. In return, a speed improvement will be achieved as we will see later.
cdef int _get_index(self, position): cdef int row, column, index row, column = position if column > row: row, column = column, row index = (row) * (row + 1) // 2 + column return index
To compile a Cython source file to a C or C++ source file, the
cythonize command from
Cython.Build is used. This command calls the Cython compiler. For the next step, compiling a C/C++ source into a Python extension module, the
distutils package is used. The behaviour of
distutils is usually controlled through a Python script, typically named
setup.py. The following
setup.py script compiles a
symmetric_matrix.pyx source file into a Python extension module:
from distutils.core import setup from Cython.Build import cythonize setup( name='Symmetric Matrix', ext_modules=cythonize('symmetric_matrix.pyx') )
distutils to build an extension module, the
build_ext argument is provided. The optional
-i flag puts the compiled extension next to its corresponding Cython source file:
$ python3 setup.py build_ext -i
To make building of an extension even easier, a
Makefile is provided together with the source file, unit tests, the setup script and the benchmark script.
The benchmark is aimed on comparing access times for writing to the entire matrix between the symmetric matrix implemented in Python, the symmetric matrix implemented in Cython and the matrix created via the
numpy module. This computation is performed five times and the average result is calculated. The experiments ran on an Intel Quad-Core i7-4700HQ (6M Cache, 2.40 GHz) processor. The benchmark script uses a \(4000 \times 4000\) matrix. When creating a symmetric matrix (Python or Cython version),
array.array() is used as the underlying storage. To create the
numpy.zeros() is called. The elements in both matrices are 64b integers.
From the following table, we can see that the average access time for the symmetric matrix implemented in Cython is much better than the average access time for the symmetric matrix implemented in Python. Indeed, it is very close to the average access time for the
|Matrix Type||Access Time|
|Symmetric Matrix (Python version)||11.70 sec|
|Symmetric Matrix (Cython version)||2.70 sec|
Overview of Changes
Only by changing the suffix from
.pyx and creating the
setup.py script for building an extension, the average access time improved by 1.70s. When converting the
SymmetricMatrix class to an extension type, the
cdef statement needed to be inserted before
class SymmetricMatrix. This conversion to an extension type and specifying types for instance attributes saves approximately 3.70s. Adding types to local variables shortened the average access time about more than 2s. Another great improvement, around 1.6s, was achieved by defining the internal
_get_index() method as a C function and specifying its return type. However, this enhancement brought also one disadvantage: a C function cannot raise exceptions. Therefore, bounds checking and raising the
IndexError exception has to be added after every call to
_get_index(), which may lead to duplicate code. In the pure Python implementation, we did not have to manually raise
IndexError because Python did it for us. In the end, I also tried specifying types in method signatures but this unexpectedly slowed down the execution.
The goal was to speed up the original Python implementation of a symmetric matrix introduced in the previous post. Cython was chosen as a tool to achieve this goal. According to the result from the benchmark script, it speeded it up without a need to write the whole implementation in C. The resulting code runs 4.3 times faster and it is understandable for Python programmers. In additon, low memory requirements were preserved. The only disadvantage lies in the need to install Cython to build an extension type.
The complete source code of the
SymmetricMatrix class implemented in Cython, alongside with unit tests, the setup script, Makefile and the benchmark script, is available on GitHub. It is known to work with Python 3.4 (CPython implementation), Cython 0.22 and GCC 4.9.