NLP Engineer
You will design and build end-to-end text extraction pipelines for policy, regulatory, fintech, and healthcare documents.
Responsibilities
- Design and build end-to-end text extraction pipelines for complex document types.
- Extract key entities and structure policy clauses and obligations.
- Fine-tune BERT and RoBERTa for NER, text classification, and relation extraction tasks.
- Leverage LLM APIs using prompt engineering, tool/function calling, and structured output extraction.
- Build scalable Python pipelines for high-volume processing of PDF, DOCX, and HTML files.
- Define and enforce JSON schemas to ensure outputs are compatible with knowledge graph ingestion.
- Evaluate model performance and implement feedback loops to improve extraction quality.
Required Skills
- 5+ years of hands-on NLP engineering in production pipelines.
- Python.
- NLP (spaCy, NLTK).
- HuggingFace Transformers.
- Deep Learning.
- LLM API Integration.
- Data Pipeline development.
- JSON Schema.
- Pydantic.
- Any Graduate degree.
Preferred Skills
- Experience with legal, regulatory, or policy documents.
- Familiarity with knowledge graphs or graph databases like Neo4j or RDF.
- Document parsing tools including pdfplumber, Docling, or Apache Tika.
- Domain knowledge in fintech or healthcare NLP.
- Exposure to information extraction benchmarks such as CoNLL, DocRED, or SciERC.