Dominik Bieber @dbeaver

0 Beiträge0 Beteiligte0 Beiträge heute

**♡ Eva Winterschön ♡** @winterschon@bsd.cafe · 12. Apr.

♡ Eva Winterschön ♡ @winterschon@bsd.cafe

FreeBSD CUDA drm-61-kmod

"Just going to test the current pkg driver, this will only take a second...", the old refrain goes. Surely, it will not punt away an hour or so of messing about in loader.conf on this EPYC system...

- Here are some notes to back-track a botched/crashing driver kernel panic situation.
- Standard stuff, nothing new over the years here with loader prompt.
- A few directives are specific to this system, though may provide a useful general reference.
- The server has an integrated GPU in addition to nvidia pcie, so a module blacklist for the "amdgpu" driver is necessary (EPYC 4564P).

Step 1: during boot-up, "exit to loader prompt"
Step 2: set/unset the values as needed at the loader prompt

unset nvidia_load
unset nvidia_modeset_load
unset hw.nvidiadrm.modeset
set module_blacklist=amdgpu,nvidia,nvidia_modeset
set machdep.hyperthreading_intr_allowed=0
set verbose_loading=YES
set boot_verbose=YES
set acpi_dsdt_load=YES
set audit_event_load=YES
kern.consmsgbuf_size=1048576
set loader_menu_title=waffenschwester
boot

Step 3: login to standard tty shell
Step 4: edit /boot/loader.conf (and maybe .local)
Step 5: edit /etc/rc.conf (and maybe .local)
Step 6: debug the vast output from kern.consmsgbuf logs

Screenshot of an IPMI terminal, showing the FreeBSD kernel loader prompt, with variables printed, and the discussed variables being set. After this stage, the boot process resumes.

#freebsd #nvidia #cuda

**GripNews** @GripNews@mastodon.social · 12. Apr.

12. Apr.

GripNews @GripNews@mastodon.social

GitHub - Rust-GPU/Rust-CUDA：使用 Rust 撰寫和執行快速 GPU 程式碼的生態系統
➤ 打造 Rust 在 GPU 計算領域的地位
✤ https://github.com/Rust-GPU/Rust-CUDA
Rust-CUDA 是一個專案，旨在使 Rust 成為使用 CUDA 工具包進行高效能 GPU 計算的首選語言。它提供了一系列函式庫和工具，可將 Rust 編譯為快速的 PTX 程式碼，並與現有的 CUDA 函式庫整合。該專案包含 `rustc_codegen_nvvm` (Rust 編譯器後端)、`cuda_std` (GPU 端功能)、`cudnn` (深度神經網路加速)、`cust` (CPU 端 CUDA 功能)、`gpu_rand` (GPU 隨機數產生) 和 `optix` (光線追蹤) 等多個 crates，旨在覆蓋整個 CUDA 生態系統。儘管目前仍處於早期開發階段，但 Rust-CUDA 旨在克服以往 Rust 與 CUDA 整合的困難，並充分利用 Rust 的優勢，如效能
#開發工具 #GPU #Rust #CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust. - Rust-GPU/Rust-CUDA

GitHubGitHub - Rust-GPU/Rust-CUDA: Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust. - Rust-GPU/Rust-CUDA

**Hacker News 50** @hn50@social.lansky.name · 11. Apr.

11. Apr.

Hacker News 50 @hn50@social.lansky.name

Rust CUDA Project

Link: https://github.com/Rust-GPU/Rust-CUDA
Discussion: https://news.ycombinator.com/item?id=43654881

#rust #cuda

**Hacker News** @h4ckernews@mastodon.social · 11. Apr.

11. Apr.

Hacker News @h4ckernews@mastodon.social

Rust CUDA Project

https://github.com/Rust-GPU/Rust-CUDA

#HackerNews #Rust #CUDA

**Hacker News 50** @hn50@social.lansky.name · 4. Apr.

4. Apr.

Hacker News 50 @hn50@social.lansky.name

Nvidia adds native Python support to CUDA

Link: https://thenewstack.io/nvidia-finally-adds-native-python-support-to-cuda/
Discussion: https://news.ycombinator.com/item?id=43581584

The New Stack · 2. Apr.NVIDIA Finally Adds Native Python Support to CUDAFor years, NVIDIA’s CUDA software toolkit for GPUs didn't have native Python support. But that’s now changed.

#cuda #python #nvidia

**Amartya** @amartya@fosstodon.org · 4. Apr.

4. Apr.

Amartya @amartya@fosstodon.org

My brain is absolutely fried.
Today is the last day of coursework submissions for this semester. What a hectic month.
DNN with PyTorch, Brain model parallelisation with MPI, SYCL and OpenMP offloading of percolation models,hand optimizing serial codes for performance.
Two submissions due today. Submitted one and finalising my report for the second one.
Definitely having a pint after this

#sycl #hpc #msc

**Hacker News 50** @hn50@social.lansky.name · 1. Apr.

1. Apr.

Hacker News 50 @hn50@social.lansky.name

Ask HN: Why hasn't AMD made a viable CUDA alternative?

Discussion: https://news.ycombinator.com/item?id=43547309

news.ycombinator.comAsk HN: Why hasn’t AMD made a viable CUDA alternative? | Hacker News

#cuda

**Natasha Nox** @Natanox@chaos.social · 29. März

29. März

Natasha Nox @Natanox@chaos.social

ffs, why does their docker only support Navi 31 and not Navi 32?
https://hub.docker.com/r/rocm/pytorch

I just wish both #Nvidia and #AMD would stop with that whole licensing bullshit around #CUDA and #ROCm and just include that damn stuff in the default driver.
I just want to run #Codestral on my local machine so I can use it with non-public code. Will be troublesome enough to cram it into 16gb VRAM.
#computer #Linux #AI

**Alexandre Mutel** @xoofx@mastodon.social · 28. März *

28. März *

Alexandre Mutel @xoofx@mastodon.social

Ported https://salykova.github.io/sgemm-gpu to Vulkan (nice article!)

it's 2x slower than Cuda. That one was tricky to port, (e.g. need to alias shared buffer to allow LDS/STS.128), half of it with AI, 2nd half going over lines 1-by-1

Ported the Kernel 5 from https://seb-v.github.io/optimization/update/2025/01/20/Fast-GPU-Matrix-multiplication.html to Vulkan

It's only 15% slower than Cuda (and it works on AMD)

In both cases, it's quite difficult to reason about SPIR-V ISA, apart the AMD GPU Analyzer that is helping!

Digging deeper

salykova blog · 12. Jan.Beating cuBLAS in Single-Precision General Matrix MultiplicationThis blog post focuses on a GPU implementation of SGEMM (Single-precision GEneral Matrix Multiply) operation defined as C := alphaAB + beta*C. We’ll review the algorithm’s design and discuss optimization techniques such as inlined PTX, asynchronous memory copies, double-buffering, avoiding shared memory bank conflicts, and efficient coalesced storage through shared memory.

#vulkan #cuda

**wasmVision** @wasmvision@mastodon.social · 28. März

28. März

wasmVision @wasmvision@mastodon.social

wasmVision 0.3.0 is out! We have some exiting new features for you such as MCP server support, and experimental GPU acceleration for vision models. Performance and stability improvements too. Go get it right now!

#wasm #computervision #opencv #golang #tinygo #rust #clang #mcp #cuda

https://github.com/wasmvision/wasmvision/releases/tag/v0.3.0

What's Changed

fix: add missing blurc processor from list of known processors by @deadprogram in #49
modules: update wazero to version 1.9 by @deadprogram in #50
all: update to use Go 1.24.1 by @d...

GitHubRelease 0.3.0 · wasmvision/wasmvisionWhat's Changed fix: add missing blurc processor from list of known processors by @deadprogram in #49 modules: update wazero to version 1.9 by @deadprogram in #50 all: update to use Go 1.24.1 by @d...

**CEOTECH.IT** @ceotech@mastodon.social · 24. März

24. März

CEOTECH.IT @ceotech@mastodon.social

NVIDIA GeForce RTX 5060 e 5060 Ti: arrivo imminente
#Aprile2025 #Componenti #CUDA #Gaming #GDDR7 #GeForce #GPU #Hardware #Leak #Notizie #Novità #NVIDIA #NVIDIARTX #PC #RTX5060 #RTX5060Ti #Rumors #SchedeGrafiche #SchedeVideo #TechNews #Tecnologia

https://www.ceotech.it/nvidia-geforce-rtx-5060-e-5060-ti-arrivo-imminente/

**HGPU group** @hgpu@mast.hpc.social · 23. März

23. März

HGPU group @hgpu@mast.hpc.social

The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUs

#SYCL #ROCm #CUDA #PTX #OpenMP #MPI #Astrophysics #Physics #Package

https://hgpu.org/?p=29827

hgpu.org · 23. MärzThe Shamrock code: I- Smoothed Particle Hydrodynamics on GPUsWe present Shamrock, a performance portable framework developed in C++17 with the SYCL programming standard, tailored for numerical astrophysics on Exascale architectures. The core of Shamrock is a…

**Dave Tabb** @dtabb73@mastodon.africa · 21. März

21. März

Dave Tabb @dtabb73@mastodon.africa

I had hoped that DIA-NN 2.02 could be accelerated by either ATI or nVidia GPUs, but I have tried with both types of cards to no avail.
#bioinformatic #proteomics #OpenCL #CUDA

**.:\dGh/:.** @darkghosthunter@mastodon.social · 18. März

18. März

.:\dGh/:. @darkghosthunter@mastodon.social

Is there any difference between computing AI workloads in Vulkan, OpenCL and CUDA?

I know that some people say that NVIDIA doesn't support (quite well) OpenCL or Vulkan, performance is achieved by using CUDA. But what is the story for other vendors (Intel, AMD, QualComm, Apple) ?

#AI #Programming #AIProgramming

**NeussWave** @NeussWave@neuss.social · 17. März

17. März

NeussWave @NeussWave@neuss.social

Ich frage mich gerade, ob ich mir #CUDA reinschaufeln mag. Also eigentlich... aber...

**Paul Houle** @UP8@mastodon.social · 15. März

15. März

Paul Houle @UP8@mastodon.social

Just got my RSS reader YOShInOn building with uv and running under WSL2 with the Cuda libraries, despite a slight version mismatch... All I gotta do is switch it from arangodb (terrible license) to postgres, and it might have a future... With sentence_transformers running under WSL2 I might even be able to deduplicate the million images in my Fraxinus image sorter

screenshot with two windows; on left there is a summary of the output of a recommendation engine, on the right there is a list of python packages used by that same engine

#python #programming #ai

**Hacker News 50** @hn50@social.lansky.name · 12. März

12. März

Hacker News 50 @hn50@social.lansky.name

Sorting algorithms with CUDA

Link: https://ashwanirathee.com/blog/2025/sort2/
Discussion: https://news.ycombinator.com/item?id=43338405

ashwanirathee.comSorting Algorithms with CUDA! | Ashwani RatheeA simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design.

#cuda

Antwortete im Thread

**Giuseppe Bilotta** @giuseppebilotta@fediscience.org · 10. März

10. März

Giuseppe Bilotta @giuseppebilotta@fediscience.org

Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in #GPUSPH, but not a #SYCL or #OneAPI backend (which would allow us to extend hardware support to #Intel GPUs). <https://doi.org/10.1002/cpe.8313>

This is also one of the reason why we implemented our own #BLAS routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved #BiCGSTAB that I've had the opportunity to mention before <https://doi.org/10.1016/j.jcp.2022.111413>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is

a. too much effort
b. probably not worth it.

Again, following @eniko's original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.