Home
Models.
Open-source ML models trained for multilingual semantic search. Fine-tuned for real-world code-mixed text that people actually use.
NLP âĸ Semantic Searchđ¤ View on Hugging Face
Marathlish MiniLM
A bilingual semantic search model fine-tuned on mixed Marathi-English (Marathlish) text. Built for search applications where users naturally mix Marathi and English â the way people actually speak and type in Maharashtra.
- âFine-tuned on curated Marathi-English code-mixed dataset
- âBased on MiniLM-L6-v2 architecture for fast inference
- âDesigned for semantic search, not just keyword matching
- âWorks with Sentence Transformers API out of the box
marathienglishsemantic-searchsentence-transformersmultilingual
NLP âĸ Semantic Searchđ¤ View on Hugging Face
Hinglish MiniLM
Semantic search model for Hindi-English (Hinglish) code-mixed queries. Trained to understand the natural mixing of Hindi and English that 500M+ speakers use daily across India. Optimized for real-world search applications.
- âHindi-English bilingual semantic embeddings
- âHandles Romanized Hindi (Devanagari + Latin scripts)
- âLightweight MiniLM architecture â runs on CPU
- âCross-lingual retrieval support
hindienglishhinglishsemantic-searchsentence-transformers
Quick Start
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("anuragwagh0/marathlish-minilm")
# Encode queries
queries = ["ā¤Žā¤žā¤ā¤ž phone ā¤āĨ⤠āĨ ā¤ā¤šāĨ?", "best restaurants in Pune"]
embeddings = model.encode(queries)
# Use for semantic search, clustering, etc.
print(embeddings.shape) # (2, 384)