Returning Candidate?

Senior Data Scientist

Req #: 2026-13572
# of Openings: 1
Job Locations: IN-TG-Hitech City
Category: Engineering

Overview

We are looking for an end-to-end Data Scientist to design, build, and maintain ML-powered systems that solve core data quality and classification problems across the business. You will own the full lifecycle — from exploratory analysis and feature engineering through model training, deployment, and ongoing performance monitoring. The work spans entity resolution (identifying duplicate records across large datasets) and multi-class classification models that drive decision-making across a variety of business domains.

Responsibilities

What You'll Do

Own the end-to-end model lifecycle: problem framing, data exploration, feature engineering, model training, evaluation, deployment, and monitoring

Build and maintain entity resolution systems that detect duplicate records using supervised ML and string similarity techniques
Develop classification models that categorize unstructured or semi-structured data into meaningful business categories
Engineer features from messy, real-world text data — names, addresses, free-text fields — using string matching algorithms, phonetic encoding, n-grams, and other NLP techniques
Design candidate retrieval and indexing strategies to make models performant at scale
Tune thresholds, scoring logic, and rule-based overrides to balance precision and recall for production use cases
Maintain production model artifacts and data pipelines, ensuring models stay current as underlying data evolves
Collaborate with engineering and product teams to understand requirements and translate business problems into well-scoped modeling tasks

Qualifications

10+ years of experience building and deploying ML models end-to-end (not just notebooks)
Strong Python skills — pandas, NumPy, scikit-learn, XGBoost or similar gradient boosting frameworks
Hands-on experience with record linkage, entity resolution, or deduplication problems
Experience building classification models (binary and multi-class) on structured and semi-structured data
Deep familiarity with string similarity algorithms: edit distance, sequence matching, phonetic encoding, shingling
Strong feature engineering instincts — ability to extract signal from noisy, inconsistently formatted data
Comfort working with large serialized data structures and understanding memory/performance tradeoffs in production contexts
Experience with SQL and relational databases (PostgreSQL or similar)
Clear communication skills — ability to explain model behavior and tradeoffs to non-technical stakeholders

Nice to Have

Experience with blocking and indexing strategies for scalable record linkage
Background in NLP, text normalization, or information extraction
Familiarity with model serving in API contexts (Flask, FastAPI, or similar)
Experience in data quality, master data management, or marketplace domains
Exposure to deep learning frameworks (PyTorch, TensorFlow) for text classification

Options

ApplyApply

Submit a ReferralRefer

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.

Stay connected and be the first to know about exciting future opportunities at RealPage

By joining our talent community you'll be considered for positions that match your skills and interests. Don't miss out—sign up today and take the next step in your career journey

Application FAQs