Home

Models.

Open-source ML models trained for multilingual semantic search. Fine-tuned for real-world code-mixed text that people actually use.

NLP â€ĸ Semantic Search🤗 View on Hugging Face

Marathlish MiniLM

A bilingual semantic search model fine-tuned on mixed Marathi-English (Marathlish) text. Built for search applications where users naturally mix Marathi and English — the way people actually speak and type in Maharashtra.

  • →Fine-tuned on curated Marathi-English code-mixed dataset
  • →Based on MiniLM-L6-v2 architecture for fast inference
  • →Designed for semantic search, not just keyword matching
  • →Works with Sentence Transformers API out of the box
marathienglishsemantic-searchsentence-transformersmultilingual
NLP â€ĸ Semantic Search🤗 View on Hugging Face

Hinglish MiniLM

Semantic search model for Hindi-English (Hinglish) code-mixed queries. Trained to understand the natural mixing of Hindi and English that 500M+ speakers use daily across India. Optimized for real-world search applications.

  • →Hindi-English bilingual semantic embeddings
  • →Handles Romanized Hindi (Devanagari + Latin scripts)
  • →Lightweight MiniLM architecture — runs on CPU
  • →Cross-lingual retrieval support
hindienglishhinglishsemantic-searchsentence-transformers

Quick Start

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("anuragwagh0/marathlish-minilm")

# Encode queries
queries = ["ā¤Žā¤žā¤ā¤ž phone ⤕āĨā¤ āĨ‡ ā¤†ā¤šāĨ‡?", "best restaurants in Pune"]
embeddings = model.encode(queries)

# Use for semantic search, clustering, etc.
print(embeddings.shape)  # (2, 384)