Speeding Up Symmetric Matrix With Cython

In the previous post, a Python implementation of a symmetric matrix was described. This implementation saves approximately 50% of memory space. However, when compared with a matrix created via the numpy module, the average access time for our implementation was much worse than the average access time for the numpy matrix. In this post, we will speed up the access time by using Cython.

Motivation and Goal

As mentioned in the introduction, the main driving force to reimplement our symmetric matrix lies in its slow access time. In the following table, the average access times for writing to all elements of a matrix are shown for both matrix types. All input conditions, matrix size and processor, were the same.

Matrix Type Access Time
Symmetric Matrix (via array) 11.26 sec
numpy Matrix 2.00 sec

Hence, the main goal is to speed up the existing implementation while keeping the current usage and memory requirements untouched.

There are several tools and techniques that can be used to achieve this goal. Some of them are:

  • Wrapping the C/C++ implementation using the Swig wrapper generator.
  • Defining a new type in Python using Python C API.
  • Using Cython.

Swig is one of the oldest and very powerful tool for building extension modules. The best start scenario for using Swig is having a large existing base of C or C++ source codes that are needed to be accessed from Python. In the current situation, a C or C++ implementation of a symmetric matrix needs to be provided in order to use Swig. Since a symmetric matrix would be implemented in C or C++, it is highly possible that speed would improve. However, this approach is not very straightforward because you would need to write the C/C++ version first.

Python C API provides a way to define new types in Python. Originally, I wanted to take this direction because I believed that it will deliver the most optimized implementation. However, I came across Cython and decided to give it a try. And I love it from the day one.

So, what is Cython? It is an extended Python language and Python compiler at the same time. The main advantage is that the source code written in Cython looks like Python and Cython is able to automatically translate it into C. Since Cython generates very efficient C code, you get C speed from inside Python. In comparison with using bare Python C API:

  • you do not have to write C,
  • you do not need to start from scratch if you already have a Python implementation – you edit the source code you already have,
  • the resulting source code is much shorter than its C equivalent – therefore, it is also easier to maintain,
  • writing code in Cython is less time-consuming and less error-prone.

Stefan Behnel, a long-time Cython core developer, told us in his lecture about writing C extensions using bare Python C API the following:

You can do that but it is unlikely that:
a) it is actually going to work
b) it is going to be correct
c) it is going to be as fast as what Cython generates anyway.

Okay? So, do not do that.

All right. Lets start with the implementation.

Implementation

Firstly, we change a little bit the definition of the SymmetricMatrix class according to the Cython requirements for creating an extension type by adding the cdef statement before class SymmetricMatrix:

cdef class SymmetricMatrix:

Then, we specify types of instance attributes outside the __init__() method as is required by Cython:

	cdef:
		int _size
		int _data_size
		object _data

Since the original interface should stay the same, the signature of __init__() is unchanged. The rest stays same as in the Python implementation:

	def __init__(self, size, create_storage=None):
		if size <= 0:
			raise ValueError('size has to be positive')

		self._data_size = (size + 1) * size // 2

		if create_storage is None:
			create_storage = lambda n: [0 for i in range(n)]

		self._size = size
		self._data = create_storage(self._data_size)

The implementation of the __len__() method stays completely same as in the Python version:

	def __len__(self):
		return self._size

A few changes can be seen in the implementation of the __setitem__() method. More specifically, type information and bounds checking were added:

	def __setitem__(self, position, value):
		cdef int index

		index = self._get_index(position)
		if index < 0 or index >= self._data_size:
			raise IndexError('out of bounds')

		self._data[index] = value

The reason to insert bounds checking here instead of in the _get_index() method is explained later. The __getitem__() method can be written analogically to __setitem__():

	def __getitem__(self, position):
		cdef int index

		index = self._get_index(position)
		if index < 0 or index >= self._data_size:
			raise IndexError('out of bounds')

		return self._data[index]

The implementation of the _get_index() method uses also explicit type information for local variables. And on top of that, the cdef statement is added before the definition of this method. Also, a return type for the method is specified. The cdef statement is used to define C functions. These functions can be called within a Cython module but they cannot be called from outside of the module by interpreted Python code. Since _get_index() is an internal method called only inside the class, this limitation does not cause any harm. However, bounds checking and raising the IndexError exception need to be put after every call of _get_index() to work properly. In return, a speed improvement will be achieved as we will see later.

	cdef int _get_index(self, position):
		cdef int row, column, index

		row, column = position
		if column > row:
			row, column = column, row

		index = (row) * (row + 1) // 2 + column

		return index

To compile a Cython source file to a C or C++ source file, the cythonize command from Cython.Build is used. This command calls the Cython compiler. For the next step, compiling a C/C++ source into a Python extension module, the distutils package is used. The behaviour of distutils is usually controlled through a Python script, typically named setup.py. The following setup.py script compiles a symmetric_matrix.pyx source file into a Python extension module:

from distutils.core import setup
from Cython.Build import cythonize

setup(
	name='Symmetric Matrix',
	ext_modules=cythonize('symmetric_matrix.pyx')
)

To instruct distutils to build an extension module, the build_ext argument is provided. The optional -i flag puts the compiled extension next to its corresponding Cython source file:

$ python3 setup.py build_ext -i

To make building of an extension even easier, a Makefile is provided together with the source file, unit tests, the setup script and the benchmark script.

Benchmark

The benchmark is aimed on comparing access times for writing to the entire matrix between the symmetric matrix implemented in Python, the symmetric matrix implemented in Cython and the matrix created via the numpy module. This computation is performed five times and the average result is calculated. The experiments ran on an Intel Quad-Core i7-4700HQ (6M Cache, 2.40 GHz) processor. The benchmark script uses a \(4000 \times 4000\) matrix. When creating a symmetric matrix (Python or Cython version), array.array() is used as the underlying storage. To create the numpy matrix, numpy.zeros() is called. The elements in both matrices are 64b integers.

From the following table, we can see that the average access time for the symmetric matrix implemented in Cython is much better than the average access time for the symmetric matrix implemented in Python. Indeed, it is very close to the average access time for the numpy matrix:

Matrix Type Access Time
Symmetric Matrix (Python version) 11.70 sec
Symmetric Matrix (Cython version) 2.70 sec
numpy Matrix 2.12 sec

Overview of Changes

Only by changing the suffix from .py to .pyx and creating the setup.py script for building an extension, the average access time improved by 1.70s. When converting the SymmetricMatrix class to an extension type, the cdef statement needed to be inserted before class SymmetricMatrix. This conversion to an extension type and specifying types for instance attributes saves approximately 3.70s. Adding types to local variables shortened the average access time about more than 2s. Another great improvement, around 1.6s, was achieved by defining the internal _get_index() method as a C function and specifying its return type. However, this enhancement brought also one disadvantage: a C function cannot raise exceptions. Therefore, bounds checking and raising the IndexError exception has to be added after every call to _get_index(), which may lead to duplicate code. In the pure Python implementation, we did not have to manually raise IndexError because Python did it for us. In the end, I also tried specifying types in method signatures but this unexpectedly slowed down the execution.

Evaluation

The goal was to speed up the original Python implementation of a symmetric matrix introduced in the previous post. Cython was chosen as a tool to achieve this goal. According to the result from the benchmark script, it speeded it up without a need to write the whole implementation in C. The resulting code runs 4.3 times faster and it is understandable for Python programmers. In additon, low memory requirements were preserved. The only disadvantage lies in the need to install Cython to build an extension type.

Source Code

The complete source code of the SymmetricMatrix class implemented in Cython, alongside with unit tests, the setup script, Makefile and the benchmark script, is available on GitHub. It is known to work with Python 3.4 (CPython implementation), Cython 0.22 and GCC 4.9.

Leave a Comment.