ThreadedDenseSparseMul.jl
ThreadedDenseSparseMul.jl
is a Julia package that provides a threaded implementation of dense-sparse matrix multiplication, built on top of Polyester.jl
.
This package addresses the need for fast computation of C ← C + D × S
, where D
and S
are dense and sparse matrices, respectively. It differs from existing solutions in the following ways:
- SparseArrays.jl doesn't support threaded multiplication.
- MKLSparse.jl doesn't support dense × sparsecsc multiplication directly.
- ThreadedSparseCSR.jl and ThreadedSparseArrays.jl support only sparse × dense multiplication.
ThreadedDenseSparseMul.jl shows significant performance improvements over base Julia implementations, especially for large matrices.
Performance
fastdensesparsemul_threaded!
outperforms MKLSparse
by 2x:
fastdensesparsemul_outer_threaded!
outperforms SparseArrays
by 4x:
Usage
To use ThreadedDenseSparseMul.jl, simply install and import the package, and launch Julia with some threads (e.g., julia --threads=auto
). Then, you can use any of the following accelerated functions:
using ThreadedDenseSparseMul
# ^ will call ThreadedDenseSparseMul.set_num_threads(Threads.nthreads) during `__init__`
using SparseArrays
A = rand(1_000, 2_000)
B = sprand(2_000, 30_000, 0.05)
buf = similar(A, size(A,1), size(B,2)) # prealloc
fastdensesparsemul!(buf, A, B, 1, 0)
fastdensesparsemul_threaded!(buf, A, B, 1, 0)
fastdensesparsemul_outer!(buf, @view(A[:, 1]), B[1,:], 1, 0)
fastdensesparsemul_outer_threaded!(buf, @view(A[:, 1]), B[1,:], 1, 0)
The interface is adapted from the 5-parameter definition used by mul!
and BLAS.
To instead override mul!
directly so you can run C = A*B
, see the next section.
Overriding mul!
with Type Piracy ☠️
If you are aware of the consequences, you can also commit type piracy (harrrr!) and override mul!
and (*)
by calling
ThreadedDenseSparseMul.override_mul!() # threaded
# or
ThreadedDenseSparseMul.override_mul!(false) # non-threaded
Then, you can just run e.g.
D = rand(100, 200); S = sprand(200, 500, 0.1);
C = D*S
C .+= D[:, 1] * (S[:,1]')
and the two multiplication call will dispatch to use this library (through the mul!
and (*)
*dispatch, respectively). Since the outer product doesn't use the mul!
syntax by default though, you can still manually call threadeddensesparse_outer!(C, D[:, 1], S[1, :])
to avoid an extra allocation.
API Reference
ThreadedDenseSparseMul.fastdensesparsemul!
— Methodfastdensesparsemul!(C, A, B, α, β)
BLAS like interface, computing C .= β*C + α*A*B
, but way faster than Base would.
Also see fastdensesparsemul_threaded!
for a multi-threaded version using Polyester.jl
.
ThreadedDenseSparseMul.fastdensesparsemul_outer!
— Methodfastdensesparsemul_outer!(C, a, b, α, β)
Fast outer product when computing C .= β*C + α * a*b'
, but way faster than Base would.
a
is a dense vector (or view),b
is a sparse vector,C
is a dense matrix (or view).
Also see fastdensesparsemul_outer_threaded!
for a multi-threaded version using Polyester.jl
.
ThreadedDenseSparseMul.fastdensesparsemul_outer_threaded!
— Methodfastdensesparsemul_outer_threaded!(C, a, b, α, β)
Threaded, fast outer product when computing C .= β*C + α * a*b'
, but way faster than Base would, using Polyester.jl
.
a
is a dense vector (or view),b
is a sparse vector,C
is a dense matrix (or view).
Also see fastdensesparsemul_outer!
for a single-threaded version.
ThreadedDenseSparseMul.fastdensesparsemul_threaded!
— Methodfastdensesparsemul!(C, A, B, α, β)
Threaded, BLAS like interface, computing C .= β*C + α*A*B
, but way faster than Base would. Also see fastdensesparsemul!
for a single-threaded version.
ThreadedDenseSparseMul.get_num_threads
— MethodThreadedSparseCSR.get_num_threads()
Gets the number of threads used in sparse csr matrix - vector multiplication.
ThreadedDenseSparseMul.set_num_threads
— MethodThreadedSparseCSR.set_num_threads(n::Int)
Sets the number of threads used in sparse csr matrix - vector multiplication.
Implementation
The core implementation is quite simple, leveraging Polyester.jl
for threading. The result is simply something similar to
function fastdensesparsemul_threaded!(C::AbstractMatrix, A::AbstractMatrix, B::SparseMatrixCSC, α::Number, β::Number)
@batch for j in axes(B, 2)
C[:, j] .*= β
C[:, j] .+= A * (α.*B[:, j])
end
return C
end
Contributing
Contributions to ThreadedDenseSparseMul.jl are welcome! Please feel free to submit issues or pull requests on the GitHub repository.
License
ThreadedDenseSparseMul.jl is licensed under the MIT License. See the LICENSE file in the package repository for more details.