Numpy needs a BLAS library that has CBLAS C language wrappers.
Here is a list of the options that we know about.
The ATLAS libraries have been the default BLAS / LAPACK libraries for numpy binary installers on Windows to date (end of 2015).
ATLAS uses comprehensive tests of parameters on a particular machine to chose from a range of algorithms to optimize BLAS and some LAPACK routines. Modern versions (>= 3.9) perform reasonably well on BLAS benchmarks. Each ATLAS build is optimized for a particular machine (CPU capabilities, L1 / L2 cache size, memory speed), and ATLAS does not select routines at runtime but at build time, meaning that a default ATLAS build can be badly optimized for a particular processor. The main developer of ATLAS is Clint Whaley. His main priority is optimizing for HPC machines, and he does not give much time to supporting Windows builds. Not surprisingly, ATLAS is difficult to build on Windows, and is not well optimized for Windows 64 bit.
Because there is no run-time adaptation to the CPU, ATLAS built for a CPU with SSE3 instructions will likely crash on a CPU that does not have SSE3 instructions, and ATLAS built for a SSE2 CPU will not be able to use SSE3 instructions. Therefore, numpy installers on Windows use the “superpack” format, where we build three ATLAS libraries:
We make three Windows .exe installers, one for each of these ATLAS versions, and then build a “superpack” installer from these three installers, that first checks the machine on which the superpack installer is running, to find what instructions the CPU supports, and then installs the matching numpy / ATLAS package.
There is no way of doing this when installing from binary wheels, because the wheel installation process consists of unpacking files to given destinations, and does not allow pre-install or post-install scripts.
One option would be to build a binary wheel with ATLAS that depends on SSE2 instructions. It seems that 99.5% of Windows machines have SSE2 (see: Windows versions). It is not technically difficult to put a check in the numpy __init__.py file to give a helpful error message and die when the CPU does not have SSE2:
try: from ctypes import windll, wintypes except (ImportError, ValueError): pass else: has_feature = windll.kernel32.IsProcessorFeaturePresent has_feature.argtypes = [wintypes.DWORD] if not has_feature(10): msg = ("This version of numpy needs a CPU capable of SSE2, " "but Windows says - not so.\n", "Please reinstall numpy using a superpack installer") raise RuntimeError(msg)
It is closed-source, but available for free under the Community licensing program.
The Intel license does allow us, the developers, to distribute copies of the MKL with our built binaries, but carries the following extra license terms:
DISTRIBUTION: Distribution of the Redistributables is also subject to the following limitations: [clauses i through iii omitted]
[You] (iv) will provide the Redistributables subject to a license agreement that prohibits disassembly and reverse engineering of the Redistributables except in cases when you provide Your Product subject to an open source license that is not an Excluded License, for example, the BSD license, or the MIT license, (v) will indemnify, hold harmless, and defend Intel and its suppliers from and against any claims or lawsuits, including attorney’s fees, that arise or result from Your modifications, derivative works or Your distribution of Your Product.
Clause iv does allow us numpyers (as we use the BSD license) to distribute wheels without asking the users to submit to extra licensing terms. Unfortunately clause v makes us, the developers, responsible for legal costs that could be very large.
The ACML was AMD’s equivalent to the MKL, with similar or moderately worse performance. As of time of writing (December 2015), AMD has marked the ACML as “end of life”, and suggests using the AMD compute libraries instead.
The ACML does not appear to contain a CBLAS interface.
Binaries linked against ACML have to conform to the ACML license which, as for the MKL, requires software linked to the ACML to subject users to the ACML license terms including:
2. Restrictions. The Software contains copyrighted and patented material, trade secrets and other proprietary material. In order to protect them, and except as permitted by applicable legislation, you may not:
a) decompile, reverse engineer, disassemble or otherwise reduce the Software to a human-perceivable form;
b) modify, network, rent, lend, loan, distribute or create derivative works based upon the Software in whole or in part [...]
AMD advertise the AMD compute libraries (ACL) as the successor to the ACML.
BLIS is “a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries.”
As of December 2015 the developer mailing list was fairly quiet, with only a few emails since August 2015.
libflame can also be built to include a full LAPACK implementation. It is a sister project to BLIS.
COBLAS is a “Reference BLAS library in C99”, BSD license. A quick look at the code in April 2014 suggested it used very straightforward implementations that are not highly optimized.
Eigen is “a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.”
Mostly covered by the Mozilla Public Licence 2, but some features covered by the LGPL. Non-MPL2 features can be disabled
It is technically possible to compile Eigen into a BLAS library, but there is currently no CBLAS interface.
GotoBLAS2 is the predecessor to OpenBLAS. It was a library written by Kazushige Goto, and released under a BSD license, but is no longer maintained. Goto now works for Intel. It was at or near the top of benchmarks on which it has been tested (e.g BLAS LAPACK review Eigen benchmarks). Like MKL and ACML, GotoBLAS2 chooses routines at runtime according to the processor. It does not detect modern processors (after 2011).
OpenBLAS is a fork of GotoBLAS2 updated for newer processors. It uses the 3-clause BSD license.
Julia uses OpenBLAS by default.
See OpenBLAS on github for current code state. It appears to be actively merging pull requests. There have been some worries about bugs and lack of tests on the numpy mailing list and the octave list.
It appears to be fast on benchmarks.
OpenBLAS on Win32 seems to be quite stable. Some OpenBLAS issues on Win64 can be adressed with a single threaded version of that library.