Skip to content

A Personal Note: And No, It’s Not The End!

Published:
A Personal Note: And No, It’s Not The End!

It’s the week before Christmas. You’re probably feeling the pressure to live up to all the expectations that Christmas brings each year. I hope you can let a few of them slide and choose to have a good time for a few hours in between.

From the outset, I have treated my newsletter The Liquid Engineer as an experimental playground. I started out with lots of motivation for 3D printing, and I am still convinced it’s a hugely powerful technology that will be adopted much more widely in the coming years. Then, even more interesting things happened: I switched my focus to agentic coding. In recent weeks, I’ve also incorporated a lot of content on local LLMs and the necessary software and hardware for this. This is because I’m trying to build a business in this area with OnTree.

Thank you for following me on this journey so far!

I realized that I write my best posts when I write about topics that interest me. I will continue to write this newsletter, but I won’t be bound to any particular topics in the future. It might be about something that has just happened or something that I am currently working on. If you joined me for the AI content, I invite you to stick around over the next few weeks to see if you can still find value in the content. Since AI is keeping me busy, I expect there will still be plenty of AI-related content. If there’s a topic you’d like to read about, just hit reply and let me know!

For the past few months, I have also been publishing the newsletter posts right here, on my personal blog. If you prefer this method, feel free to dust off your old RSS reader and subscribe to my RSS feed!

Have a great Christmas with your loved ones!

The Future Of Computing Will Be Hyper-personalized: Why I Signed The Resonant Computing Manifesto

Published:
The Future Of Computing Will Be Hyper-personalized: Why I Signed The Resonant Computing Manifesto

Last week, an exciting website appeared in my feeds: The Resonant Computing Manifesto.

The manifesto revolves around the idea that we are at a crossroads with AI. We can either double down on the direction we’ve already taken in our digital lives, a race to the bottom, or we can do something different. The goal is to build the most appealing digital junk food to maximize ad consumption and profits. This turns the internet into a few big platforms that are noisy and attention-seeking and don’t serve customer needs. Big tech companies like TikTok, Facebook, and Instagram are pushing in this direction with full force, trying to use AI for hyper-personalization on their platforms.

The manifesto coined the term “resonance,” which comes from the field of architecture (the architecture of real buildings, not software). It describes how we feel more at home and more human in certain environments. It is a quality without a name that is not strictly measurable and more intuitively graspable.

The manifesto suggests that AI can advance the current state of the internet and allow for new possibilities. This technology enables more hyper-personalized experiences off the major platforms because the technical requirements for one-size-fits-all solutions have disappeared.

The manifesto centers on five principles that resonate with my vision for OnTree:

  1. A private experience on the Internet,
  2. that is dedicated exclusively to each customer,
  3. with no single entity controlling it (Plural),
  4. adaptable to the specific needs of the customer,
  5. and prosocial, making our offline lives better, too.

I love the whole piece. Of course, it’s idealistic and probably sounds naive at first. However, I believe this world could use much more idealistic and naive believers in a new internet. Without these dreamers, nothing will change.

The only shortcoming I see is that the website doesn’t address the consequences of people being “primary stewards of their own context.” To me, this is impossible without a mindset shift away from passively being monetized and toward actively funding the software we want to succeed. Without making it clear that we must put our money where our mouth is, I feel this manifesto is incomplete.

Kagi.com is the perfect example here. Google’s primary interest in search is always monetization. Therefore, it is logically impossible for them to want you to find what you’re searching for on page one, spot one. Kagi.com has a far superior, ad-free search engine, and their main slogan is “Humanize the Web.” With attractive family and duo plans, I find Kagi to be excellent value for the money, we pay less than four euros per family member per month.

To get a resonant internet, we have to pay the right companies the right amount of money.

(Source of the banner this time is resonantcomputing.org)

What Folding Laundry Taught Me About Working With AI

Published:

What Folding Laundry Taught Me About Working With AI

Yesterday evening I was folding laundry. It was one of those pesky loads, the basket was filled with socks. We’re a four person household, in theory, that should make it easier to distinguish all the socks.

I made some space on the table to accommodate the individual socks. Laying them out flat helps find pairs. After folding one-third of the container, I realized that the space I had assigned was way too small and was already overflowing. Since the rest of the table was full, there was no more space to allocate for more socks. This was a seemingly simple and mundane task, that suddenly induced stress in me. Where would I put all these socks now?

Granted, the solution was quite easy in this case. I created some space by stowing some folded laundry, and I had enough room for the socks. What’s the connection to working with AI, you ask?

When AI became publicly available with the launch of ChatGPT, many people immediately recognized this technology’s potential. Recognizing that it’s a new technology with many unknowns, they created companies and planned generously, allowing the companies ample time to find product-market fit and generate revenue.

Stress occurs when plans and reality diverge. It’s the same mechanism, whether you have enough space for your socks or how much runway your company has. Right now, we see many companies entering a stressful phase, especially the big ones. OpenAI, for example, issued a Code Red in an internal memo. Apple abruptly fired their AI chief, John Giannandrea.

Delivering value with AI is a lot harder than everyone thought, we underestimated the complexity of AI. This has led investors to attempt crazy things, this TechCrunch article provides an absurd example: Pumping $90 million dollars into a business with an annual recurring revenue of around $400,000, valuing it at $415 million dollars sounds absurd. This strategy is called king making: declaring a winner in a market and hoping to convince customers to choose the “market leader.” It’s another symptom of the stress we’re seeing in the system right now.

This great article by Paul Ford brings it all together. He wishes for the bubble to burst, because the the frenzy for return on invest ends and we can focus on letting nerds do their best work.

Happy hacking!

Why You Should Buy an AMD machine for Local LLM Inference in 2025

Published:
Why You Should Buy an AMD machine for Local LLM Inference in 2025

We’ve covered why NVIDIA consumer cards hit a 32GB wall and why Apple’s RAM pricing is prohibitive. Now let’s talk about the actual solution: AMD Ryzen AI Max+ 395 with 128GB unified memory.

This is the hardware I chose for my home LLM inference server. Here’s why.

It’s Open, Baby!

In contrast to the two big competitors NVIDIA and Apple, AMD has a huge amount of their stack open source. What CUDA is for NVIDIA and MLX for Apple, that’s ROCm for AMD. It’s fully open source, available on GitHub, and sees a huge amount of activity. This not only gives me a warm and fuzzy feeling, but also a lot of confidence that this stack will continue to go in the right direction.

The Hardware That Changes the Game

AMD Ryzen AI Max+ 395 offers something unique in the prosumer market:

  • 128GB of fast unified memory (96GB available to GPU)
  • Integrated GPU with discrete-class performance
  • Complete system cost: 2000-2500 Euro
  • Less than half the cost of the equivalent Mac Studio!

To make this more concrete: you can run a 70B model quantized to 4-bit (~38GB) and still have 50GB+ for context. That’s enough for 250K+ token contexts, legitimately long-document processing, extensive conversation history, and complex RAG workflows.

Looking a bit into the future, it’s not hard to imagine AMD shipping the system with 256 gigabytes of RAM for a reasonable price. It’s very hard to imagine Apple shipping a 256 gigabytes machine for a reasonable price. It’s just how they make their money.

Comparison to the DGX Spark

The recently released DGX Spark is a valid competitor to AMD’s AI Max series. It also features 128GB of super unified memory. From a pure hardware value perspective, the NVIDIA DGX Spark is the most compelling alternative on the market in October 2025. Street price is around 4500 Euro right now, almost double. You get a beautiful box with very comparable hardware and better driver support. You even get a good starting point to do your first experiments, like downloading LLMs and training your model. But everything you build on is closed source. You’re 100% dependent on NVIDIA staying on top of the game, on a machine that doesn’t make a lot of money for NVIDIA. I’m not that optimistic.

With the recent explosion of speed and everything in software with the help of coding agents, I’m not confident any company can stay on top of all of that. Especially not a company that earns their biggest profits in this sector.

Also the NVIDIA DGX Spark is Arm-based, which isn’t a problem for inference and training, but for another use case which is becoming important.

Running Apps and LLMs Side by Side

If you are doing LLM inference on a local machine, the easiest setup is to also run the apps needing the inference on the same machine. Running two machines is possible but opens a huge can of worms of problems. Even though it might not make sense intuitively, such distributed systems are complex. Not double complex, more like exponentially complex. Here’s a golden question from 10 years ago on Stackoverflow, trying to explain it.

So running everything on one machine is much simpler. With AMD you’re staying on the most common CPU architecture available x86-64. With the DGX Spark, you’re in Arm land. This architecture is gaining traction, but still a far way from being universally supported. If you’re planning to experiment with a lot of small open source dockerized apps like I do, this is a big plus for the AMD route.

The Driver Reality

This is the real trade-off: AMD’s software support lags behind NVIDIA and Apple by 1-3 months for bleeding-edge models.

As we discussed in our Qwen3-Next case study:

  • vLLM doesn’t officially support gfx1151 (the Ryzen AI 395’s GPU architecture) yet
  • For architecturally novel models, you’re waiting on llama.cpp implementations
  • ROCm 7.0 works well for established models, but cutting-edge architectures take longer

Important context: This is about bleeding-edge model support, not general capability. I run Qwen3 32B, Llama 3.1 70B, DeepSeek, and multimodal models without issues. The hardware is capable, the ecosystem just needs time to catch up. When and if AMD really catches up is unknown. I just want to make clear it’s a bet.

Why Not Regular AMD GPUs?

Before we conclude, let’s address another obvious question: what about regular AMD GPUs?

AMD Radeon AI PRO R9700 (32GB) or similar:

  • Consumer price point (1400 Euro)
  • 32GB VRAM
  • Same problem as NVIDIA consumer cards, but cheaper

These cards face the same memory ceiling as NVIDIA consumer cards. Yes, driver support has improved significantly with ROCm 6.x and 7.0. But you’re still dealing with the fundamental limitation. They’re cheaper, so you can stack them together, like Level1Techs does.

Two reasons speak against this: First, you’re building a highly custom machine, with all sorts of compatibility issues. Second, with 300W each, this is a huge power draw.

Conclusion

The Ryzen AI Max+ 395 is special because it’s the only prosumer-priced hardware offering 128GB of unified memory accessible to the GPU, coming in a standardized package with decent energy efficiency.

Previously: Why you shouldn’t buy an NVIDIA GPU and Why you shouldn’t buy into the Apple ecosystem.

This concludes our three-part hardware series. The message is simple: 128GB unified memory at a reasonable price changes everything for local LLM inference, and right now, AMD is the only one delivering that.

Why you shouldn't buy into the Apple ecosystem for local LLM inference

Published:
Why you shouldn't buy into the Apple ecosystem for local LLM inference

Apple Silicon Macs are engineering marvels. The unified memory architecture works beautifully for AI workloads, MLX provides excellent framework support, and the hardware delivers impressive performance. As we saw in our deep dive on running Qwen3-Next-80B, Macs can run large models with excellent inference speed.

But here’s the hard truth: Apple’s RAM pricing model makes their hardware prohibitively expensive for local LLM inference.

This is the second in a three-part hardware series. Part one covered why NVIDIA consumer cards fall short. Now let’s talk about why Apple’s pricing ruins what would otherwise be a very good solution.

What Apple Gets Right

Let’s start with what makes Apple Silicon genuinely impressive for LLM inference:

Unified Memory Architecture

This is where Apple’s engineering truly shines. Unlike traditional systems where CPU and GPU have separate memory pools requiring constant data copying, Apple Silicon uses one unified memory pool accessible to everything.

Here’s why this matters for LLM inference:

No data copying overhead: When you load a model, it sits in memory once. The GPU doesn’t need to copy data from CPU RAM. The Neural Engine can access it directly. There’s no PCIe bottleneck, no memory duplication, just direct access.

Memory bandwidth: This is where Apple Silicon separates itself from the competition. The memory controllers are integrated directly into the SoC, enabling extremely high bandwidth:

M4 Pro (Mac Mini M4 Pro):

  • Up to 273 GB/s memory bandwidth
  • 64GB max configuration
  • 2 TB SSD to hold some llm models
  • no option for a second SSD
  • Price: 3200 Euro
  • Excellent value for most LLM use cases

M4 Max:

  • Up to 546 GB/s memory bandwidth
  • 128GB max configuration
  • 2 TB SSD
  • Price: 4800 Euro
  • Highest bandwidth in the consumer space

Why bandwidth matters: LLM inference is memory-bound. You’re constantly reading model weights and KV cache from memory. With 546 GB/s on the M4 Max, you can feed the GPU fast enough to maintain high token generation speeds even with massive models. This is 2-3x the bandwidth of typical DDR5 systems and far exceeds what you get with discrete GPUs (which are limited by PCIe bandwidth for system RAM access).

For comparison, a typical high-end DDR5 system might deliver 100-150 GB/s. AMD’s Ryzen AI Max+ 395, delivers M4 Pro levels around 250 GB/s. Apple’s M4 Max at 546 GB/s is in a league of its own.

This bandwidth advantage is why Macs can achieve 50+ tokens/sec on 70B-80B models despite having less raw compute than some alternatives. You’re not waiting on memory access—the bottleneck is genuinely compute, not memory bandwidth.

For LLM inference, this unified memory architecture with massive bandwidth is exactly what you want. It’s genuinely impressive engineering.

Excellent Driver Support

The MLX framework provides day-one support for novel model architectures. When Qwen3-Next-80B dropped with its new architecture, MLX had it running immediately. No waiting for driver updates, no compatibility hacks.

The Problems: Pricing and macOS

Here’s where it all falls apart: Apple’s RAM pricing is absurd. It’s like this for a long time. As Apple is a highly profitable company, I see zero chance of this changing in the near future. Actually, this is an understandable strategy to keep their high margins.

Even if you consider paying the premium to run on Apple Hardware, I am not recommending to run them as servers. Apples macOS is an operating system built for personal computers. While they advanced headless usage, it’s still an afterthought for Apple. Running a secure node is just not that easy and forces you to work against the OS. Running Linux is somehow possible, but also an edgecase for these distributions. The last thing you want in a quickly developing hardware ecosystem is to be a edgecase, as this usually leads to driver and other obscure problems.

The Takeaway

Apple Silicon is legitimately impressive hardware. The engineering is excellent, the performance is strong, and the software ecosystem is mature.

But Apple’s RAM pricing and macOS ruin what would otherwise be the perfect solution for local LLM inference.

For price-conscious builders who want maximum memory for local LLM inference, Apple simply doesn’t make financial sense.

Previously: Why you shouldn’t buy an NVIDIA GPU - the 32GB limitation problem.

Next: Why you should buy an AMD machine - the actual solution that gives you fast 128GB of RAM for half the price.