Small language models, such as AFM-4.5B, and Arm-based CPUs are a great match.
My latest tutorial was just published on the Arm website. I'm walking you through setting up an AWS Graviton4 instance, downloading and optimizing the model, running inference, and evaluating performance and perplexity. You'll be surprised by the numbers!
➡️ Tutorial: "Deploy AFM-4.5B on Arm-based AWS Graviton4 with Llama.cpp"
https://learn.arm.com/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/
PS: A Google Cloud Axion version of this tutorial is coming soon.