3 cool Transformers demos
The battle of the image classifiers
The Hugging Face hub hosts 1,700+ image classification models. I’ve picked the top ones from Google, Meta, Microsoft, NVIDIA and Humphrey Shi. All of them have been fine-funed on the ImageNet-1k dataset, which makes it easy to compare their performance on your images. Enjoy!
Space: https://huggingface.co/spaces/juliensimon/battle_of_image_classifiers
Audio classification with the Audio Spectrogram Transformer
Multi-modal transformers are rising fast. A great example is the Audio Spectrogram Transformer, an audio classification model that was just added to the Hugging Face Transformers library. This model first creates a spectrogram image of an audio clip and then classifies the image with a Vision Transformer model. Amazing results!
✅ Spaces demo: https://huggingface.co/spaces/juliensimon/keyword-spotting
✅ Model: https://huggingface.co/MIT/ast-finetuned-speech-commands-v2
✅ Paper: https://arxiv.org/abs/2104.01778
Question answering on HTML documents
What if you could ask questions on HTML documents, without having to convert them to plain text first? Well, that’s exactly the purpose of the Microsoft MarkupLM: just grab a page and ask a question.
I’ve built a Hugging Face Space to let you experiment with any live URL. I also implemented multithreading to speed things up on CPU. Give it a go and let me know what you think :)
✅ Space: https://huggingface.co/spaces/juliensimon/webpage_questions
✅ Model: https://huggingface.co/microsoft/markuplm-base-finetuned-websrc
✅ Paper: https://arxiv.org/abs/2110.08518



