The Nostalgic Joy of Running Large Language Models on Modest Hardware

The tech community has been buzzing about DeepSeek’s latest language model releases, and reading through various discussions brought back memories of my early computing days. Someone mentioned running a 671B parameter model at 12 seconds per token using an NVMe SSD for paging, and while many scoffed at the impracticality, it struck a chord with me.

Remember when waiting was just part of the computing experience? Back in the 80s, loading a simple game from a cassette tape could take 10-15 minutes, and we’d sit there watching those hypnotic loading stripes, filled with anticipation. The thought of having a machine that could answer complex questions in just a few hours would have seemed like science fiction back then.

Posts

The Rise of Artisanal AI: When Local Computing Became Cool Again

Remember when everyone was obsessed with mining cryptocurrency? Those makeshift rigs with multiple GPUs hanging precariously from metal frames, fans whirring away like mini jet engines? Well, history has a funny way of rhyming. The latest trend in tech circles isn’t mining digital coins - it’s running local Large Language Models.

The online discussions I’ve been following lately are filled with tech enthusiasts proudly showing off their homegrown AI setups. These aren’t your typical neat-and-tidy desktop computers; they’re magnificent contraptions of cooling systems, GPUs, and enough computing power to make any IT professional’s heart skip a beat. One particularly impressive build I spotted looked like a miniature apartment building, with GPUs occupying the “top floors” and an EPYC processor serving as the building’s superintendent.