3 cool Transformers demos

Jan 16, 2023

The battle of the image classifiers

The Hugging Face hub hosts 1,700+ image classification models. I’ve picked the top ones from Google, Meta, Microsoft, NVIDIA and Humphrey Shi. All of them have been fine-funed on the ImageNet-1k dataset, which makes it easy to compare their performance on your images. Enjoy!

Space: https://huggingface.co/spaces/juliensimon/battle_of_image_classifiers

Audio classification with the Audio Spectrogram Transformer

Multi-modal transformers are rising fast. A great example is the Audio Spectrogram Transformer, an audio classification model that was just added to the Hugging Face Transformers library. This model first creates a spectrogram image of an audio clip and then classifies the image with a Vision Transformer model. Amazing results!

✅ Spaces demo: https://huggingface.co/spaces/juliensimon/keyword-spotting
✅ Model: https://huggingface.co/MIT/ast-finetuned-speech-commands-v2
✅ Paper: https://arxiv.org/abs/2104.01778

Question answering on HTML documents

What if you could ask questions on HTML documents, without having to convert them to plain text first? Well, that’s exactly the purpose of the Microsoft MarkupLM: just grab a page and ask a question.

I’ve built a Hugging Face Space to let you experiment with any live URL. I also implemented multithreading to speed things up on CPU. Give it a go and let me know what you think :)

✅ Space: https://huggingface.co/spaces/juliensimon/webpage_questions
✅ Model: https://huggingface.co/microsoft/markuplm-base-finetuned-websrc
✅ Paper: https://arxiv.org/abs/2110.08518

Julien’s Newsletter

Discussion about this post

Ready for more?