Bundling binary tools in Python wheels

date: 2022-06-18 · Tags: #tools, #python

zig cc could be a smaller, easy-installed and full featured frontend of Clang. What you only need to do is just pip install ziglang when you're not convenient to use gcc or clang. Cross platform build will benefit from this a lot.

zig cc is not the main purpose of the Zig project. It merely exposes the already-existing capabilities of the Zig compiler via a small frontend layer that parses C compiler options.

Another two cents:

  • Other tools could be also installed from pip, such as nodejs.
  • Zig's Linux tarballs are fully statically linked. And it's also easy to build target with different glibc (eg: -target x86_64-linux-gnu.2.28, -target x86_64-linux-musl). Pretty good that host and target are decoupled.

Ref:

Further readings:

From Python to Numpy

date: 2022-06-10 · Tags: #books, #python, #scientific-computing

Though I have enough experiences in Numpy via learning different materials, I'm still amazed at the detailed and aesthetic content of From Python to Numpy.

Strongly recommend to everyone who uses Numpy in work or research, also the same recommendation to author Nicolas P. Rougier's another book Scientific Visualization. Actulally, their name are Scientific Python Volume I: From Python to Numpy and Scientific Python Volume II: Scientific Visualization

From Python to Numpy

Vectorized and Performance-portable Quicksort

date: 2022-06-05 · Tags: #hpc, #news, #algorithm

Jeff Dean quotes about a state-of-art vectorized quicksort algorithm developed by Google AI researcher Jan Wassenberg. They have tested it can be ten times faster than std::sort in C++.

Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures.

References:

  1. Jeff Dean's status
  2. Vectorized and Performance-portable Quicksort
  3. google/highway
  4. [2205.05982] Vectorized and performance-portable Quicksort

The Python GIL: Past, Present, and Future

date: 2022-06-04 · Tags: #python, #good-reading

We have known the past and present of GIL in Python, what is about the future?

I'm really looking forward to

GIL-less CPython with minimal backward incompatibilities at both the Python and C layers

And

At a high level, the removal of the GIL is afforded by changes in three areas: the memory allocator, reference counting, and concurrent collection protections.

  1. Memory Allocators: mimalloc is a general purpose, highly efficient, thread-safe memory allocator which is worthy of an in-depth look. The nogil project utilizes these structures for the implementation of dictionaries and other collection types which minimize the need for locks on non-mutating access.
  2. Reference Counting:
    • For immortal objects, nogil utilizes the least significant bits of the object’s reference count field for bookkeeping, nogil can make the refcounting macros no-op for these objects, thus avoiding all contention across threads for these fields. nogil also uses a form of biased reference counting to split an object’s refcount into two buckets. The thread that owns the object can then combine this local and shared refcount for garbage collection purposes, and it can give up ownership when its local refcount goes to zero.
    • For objects are typically owned by multiple threads and are not immortal, a deferred reference counting scheme is employed. The utility of this technique is limited to objects that are only deallocated during garbage collection because they are typically involved in reference cycles.
  3. Concurrent Collection Protections: The third high-level technique that nogil uses to enable concurrency is to implement an efficient algorithm for locking container objects, such as dictionaries and lists, when mutating them.

Wish the future is coming soon!

The Python GIL: Past, Present, and Future -- Barry Warsaw

Faster CPython, Github - faster-cpython

All About Libpas, Phil's Super Fast Malloc

date: 2022-06-02 · Tags: #system-programming

WebKit has enabled a new memory allocator called libpas to replace bmalloc.

Libpas is a fast and memory-efficient memory allocation toolkit capable of supporting many heaps at once, engineered with the hopes that someday it'll be used for comprehensive isoheaping of all malloc/new callsites in C/C++ programs.

All About Libpas, Phil's Super Fast Malloc

Compiling Black with `mypyc`

date: 2022-06-01 · Tags: #python, #good-reading

What is mypyc?

Mypyc compiles Python modules to C extensions. It uses standard Python type hints to generate fast code.

My first view:

  • Pros:
    • No other languages (C/Cython/Rust), just gradually typed Python variant.
    • Fast program startup via AOT (ahead-of-time) compilation to native code
    • Strict runtime type checking
    • Develop in interpreted mode for a quick edit-run cycle. Release codes in compiled mode. Optionally, include a fallback interpreted version for platforms that mypyc doesn’t support.
  • Cons:
    • Only support major primitive types and part of native operations
    • No information about how to adopt with concurrency threading/asyncio
    • No generator expressions and arbitrary descriptors
    • Could be demanding when your codes relies on untyped std libraries or third-party libraries which will slow down your efforts.

Hence, mypyc isn't an alternative compared with cython which target is to improve concurrency or speed up numeric performance (actually, nothing matched when I searched concurrency/thread in mypyc doc), but still worth looking forward to enhance pure Python codebase or toolchains runned as single executable binary.

References:

Let's see an excellent example that how mypyc let black has doubling performance.

Compiling Black with mypyc