@theunknownmuncher

theunknownmuncher@lemmy.world · 6 days ago

Older subarus are a high value and age well. I would expect it to hold decent value but that is crazy to be worth more 5 years later

theunknownmuncher@lemmy.world · 6 days ago

I mean yeah, that’s why I said beater. I just wouldn’t expect someone young or looking for a first car to be buying a current model year with low miles anyway.

theunknownmuncher@lemmy.world · 6 days ago

Is cash for a beater car no longer an option? I don’t want to be a “less avocado toast more bootstraps” person but a loan for a used car sounds wild to me. Maybe I’m out of touch. My vehicle is old enough to drink

theunknownmuncher@lemmy.world · 6 days ago

So… containers that people log into…? Falls under containers

theunknownmuncher@lemmy.world · 6 days ago

I just don’t get these for a bare metal system. Containers? Sounds great. Definitely on board. Bare metal? Debian, standard fedora, or gentoo is what makes sense to me

theunknownmuncher@lemmy.world · 6 days ago

Clean butt club!

theunknownmuncher@lemmy.world · 13 days ago

Its a roundabout way of writing “its really shit for this usecase and people that actively try to use it that way quickly find that out”

theunknownmuncher@lemmy.world · 13 days ago

Yup, it’s literally a bullshit machine.

theunknownmuncher@lemmy.world · 2 months ago

You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file

theunknownmuncher@lemmy.world · 2 months ago

What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching

theunknownmuncher@lemmy.world · edit-2 2 months ago

Can you try setting the num_ctx and num_predict using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter

theunknownmuncher@lemmy.world · edit-2 2 months ago

Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization

I have no problems with changing num_ctx or num_predict

theunknownmuncher@lemmy.world · 2 months ago

Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit

theunknownmuncher@lemmy.world · edit-2 2 months ago

Ummm… did you try /set parameter num_ctx # and /set parameter num_predict #? Are you using a model that actually supports the context length that you desire…?

theunknownmuncher@lemmy.world · edit-2 2 months ago

It’s not. I can run the 2.51bit quant

theunknownmuncher@lemmy.world · 2 months ago

Tell that to my home rig currently running the 671b model…

theunknownmuncher@lemmy.world · edit-2 2 months ago

Hawley’s statement called DeepSeek “a data-harvesting, low-cost AI model that sparked international concern and sent American technology stocks plummeting.”

data-harvesting

???

It runs offline… using open-source software that provably does not collect or transmit any data…

It is low-cost and out-competes American technology, though, true

theunknownmuncher@lemmy.world · 2 months ago

this is deepseek-v3. deepseek-r1 is the model that got all the media hype: https://huggingface.co/deepseek-ai/DeepSeek-R1

theunknownmuncher@lemmy.world · 2 months ago

That’s great! Hopefully it shows up on F-Droid sometime soon

theunknownmuncher@lemmy.world · 2 months ago

Trump certainly sounded like he wanted it to be that way with Ivanka