[Hiring] Senior Data Engineer @The Hackett Group


  • Design, build, and optimize ETL pipelines using AWS Glue 3.0+ and PySpark.
  • Implement scalable and secure data lakes using Amazon S3, following bronze/silver/gold zoning.
  • Write performant SQL using AWS Athena (Presto) with CTEs, window functions, and aggregations.
  • Take full ownership from ingestion → transformation → validation → metadata → documentation → dashboard-ready output.
  • Build pipelines that are not just performant, but audit-ready and metadata-rich from the first version.
  • Integrate classification tags and ownership metadata into all columns using AWS Glue Catalog tagging conventions.
  • Ensure no pipeline moves to QA or BI team without validation logs and field-level metadata completed.
  • Develop job orchestration workflows using AWS Step Functions integrated with EventBridge or CloudWatch.
  • Manage schemas and metadata using AWS Glue Data Catalog.
  • Take full ownership from ingestion → transformation → validation → metadata → documentation → dashboard-ready output.
  • Ensure no pipeline moves to QA or BI team without validation logs and field-level metadata completed.
  • Enforce data quality using Great Expectations, with checks for null %, ranges, and referential rules.
  • Ensure data lineage with OpenMetadata or Amundsen and add metadata classifications (e.g., PII, KPIs).
  • Collaborate with data scientists on ML pipelines, handling JSON/Parquet I/O and feature engineering.
  • Must understand how to prepare flattened, filterable datasets for BI tools like Sigma, Power BI, or Tableau.
  • Interpret business metrics such as forecasted revenue, margin trends, occupancy/utilization, and volatility.
  • Work with consultants, QA, and business teams to finalize KPIs and logic.
  • Build pipelines that are not just performant, but audit-ready and metadata-rich from the first version.
  • Integrate classification tags and ownership metadata into all columns using AWS Glue Catalog tagging conventions.
  • This is not just a coding role. We expect the candidate to think like a data architect within their module – designing pipelines that scale, handle exceptions, and align to evolving KPIs.



Source link

Leave a Comment