RealPage, Inc.

Senior Data Scientist

Posted Date 1 week ago(5/21/2026 6:27 AM)
Req #
2026-13572
# of Openings
1
Job Locations
IN-TG-Hitech City
Category
Engineering

Overview

We are looking for an end-to-end Data Scientist to design, build, and maintain ML-powered systems that solve core data quality and classification problems across the business. You will own the full lifecycle — from exploratory analysis and feature engineering through model training, deployment, and ongoing performance monitoring. The work spans entity resolution (identifying duplicate records across large datasets) and multi-class classification models that drive decision-making across a variety of business domains.

Responsibilities

What You'll Do 

 

Own the end-to-end model lifecycle: problem framing, data exploration, feature engineering, model training, evaluation, deployment, and monitoring 

  • Build and maintain entity resolution systems that detect duplicate records using supervised ML and string similarity techniques 
  • Develop classification models that categorize unstructured or semi-structured data into meaningful business categories 
  • Engineer features from messy, real-world text data — names, addresses, free-text fields — using string matching algorithms, phonetic encoding, n-grams, and other NLP techniques 
  • Design candidate retrieval and indexing strategies to make models performant at scale 
  • Tune thresholds, scoring logic, and rule-based overrides to balance precision and recall for production use cases 
  • Maintain production model artifacts and data pipelines, ensuring models stay current as underlying data evolves 
  • Collaborate with engineering and product teams to understand requirements and translate business problems into well-scoped modeling tasks 

Qualifications

  • 10+ years of experience building and deploying ML models end-to-end (not just notebooks) 
  • Strong Python skills — pandas, NumPy, scikit-learn, XGBoost or similar gradient boosting frameworks 
  • Hands-on experience with record linkage, entity resolution, or deduplication problems 
  • Experience building classification models (binary and multi-class) on structured and semi-structured data 
  • Deep familiarity with string similarity algorithms: edit distance, sequence matching, phonetic encoding, shingling 
  • Strong feature engineering instincts — ability to extract signal from noisy, inconsistently formatted data 
  • Comfort working with large serialized data structures and understanding memory/performance tradeoffs in production contexts 
  • Experience with SQL and relational databases (PostgreSQL or similar) 
  • Clear communication skills — ability to explain model behavior and tradeoffs to non-technical stakeholders 

Nice to Have 

  • Experience with blocking and indexing strategies for scalable record linkage 
  • Background in NLP, text normalization, or information extraction 
  • Familiarity with model serving in API contexts (Flask, FastAPI, or similar) 
  • Experience in data quality, master data management, or marketplace domains 
  • Exposure to deep learning frameworks (PyTorch, TensorFlow) for text classification 

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.