Imagem exibindo o logotipo Trampe de Casa

Senior Data Platform Engineer

Responsibilities:

  • Design & Optimization: Build, and fine-tune data clusters to support both batch and streaming workloads, ensuring optimal performance and reliability.
  • Platform Development: Build and expand our (Spark, Hadoop, Kubernetes, Trino, Delta Lake, and Druid) ecosystems to meet evolving business needs and add new integrations, data ingestion, and data transforms as needed.
  • Innovation: Introduce and scale new data platform solutions, iterating on our OLAP platforms and exploring next-generation data formats.
  • Collaboration: Work closely with cross-functional teams, including infrastructure engineers, to align platform capabilities with organizational goals.

Required qualifications:

  • Distributed Systems Expertise: Proven experience in scaling and tuning large deployments of Spark-on-Kubernetes and Spark-on-Hadoop.
  • Object Storage Solutions: Knowledge of open-source S3 alternatives, including Ceph and MinIO.
  • Storage Systems Knowledge: In-depth understanding of Hadoop and the HDFS protocol.
  • Performance Tuning: Skilled in designing and optimizing shuffle-heavy systems, utilizing YARN or Kubernetes with remote shuffle services.
  • Lakehouse Technologies: Hands-on experience with at least one lakehouse file format, such as Delta Lake, Apache Iceberg, or Apache Hudi.
  • OLAP Systems: Familiarity with OLAP technologies, including ClickHouse, Apache Druid, Apache Pinot, or Apache Doris.
  • Communication Skills: Strong ability to collaborate with diverse stakeholders and effectively communicate complex technical concepts.
  • Problem-Solving: Proven track record of troubleshooting and resolving issues in large-scale, production environments.

Preferred qualifications:

  • Advanced Data Formats: Experience with next-generation and multi-modal data formats, such as LanceDB.
  • Self-Service Platforms: Background in building self-service stateful platforms.
  • Accelerated Runtimes: Familiarity with native or accelerated runtimes for Spark, such as Apache DataFusion Comet, Apache Gluten, or NVIDIA RAPIDS.

Empresa: BairesDev

Trabalhe de Casa Arquiteto Python / Ref. 0071P

Contratação: Integral
title

Empresa: Grupo Primo

Front-end Engineer Pleno

Contratação: Integral
title

Solvd

Solvd

  • top 100 brasilícone de verificado
  • ícone de verificado
  • ícone de verificadoSolvd
  • ícone de verificadoA combinar

Compartilhar