Summary
Overview
Work History
Education
Skills
Timeline
Martin REMY

Martin REMY

Lead Data Engineer
Toulouse

Summary

Experienced and passionate Lead Data Engineer. Over the past 7 years I've worked for major corporate clients, for which I delivered efficient data pipelines implementing the latest best-practices. I love to challenge myself while collaborating with other open-minded persons.

Overview

7
7
years of professional experience
5
5
years of post-secondary education
2
2
Languages

Work History

Lead Data Engineer

Lynceus
Remote
04.2022 - Current

Lynceus is a startup providing predictive AI to specialized industry like semiconductor production. As the lead data engineer, I am fortunate to work on diverse topics and notably:

Data Availability for the internal ML team

  • Standardized our data acquisition processes (Pydantic, Pandera, AWS S3)
  • Standardized our all storage means (Apache Parquet, AWS S3)
  • Implemented a new auto validation feature for data quality checks (Python, Spark, Dagster)
  • Developed & deployed automated exports jobs to provide always fresh data (Python, Spark, Dagster)
  • Deployed a data catalog (AWS Glue)

Product Vision

  • Ownership of all data related topics in the product creation process
  • Wrote Architectural Decision Record (ADRs) & Product Requirement Document (PRDs)
  • Wrote 30-60-90 days road maps

Consulting

  • Developed & deployed data pipelines for our customers (Python, Spark, Dagster, Docker)
  • Helped our customers investigate their data lifecycle

Jack of all Trades

  • Created Infrastructure as Code (IaC) scripts for our cloud resources (Terraform)
  • Developed REST API endpoints (Python, Flask, SQL Alchemy) following a Domain Driven Design

Team management

  • Defined recruitment process for the data engineering team
  • Lead interviews assessing both technical performance and human fit

Technology stack:
PySpark, Python, Pandas, Pandera, Pydantic, Pytest, Dagster, SQL ALchemy, Flask, Terraform, AWS (S3, SageMaker, IAM, EMR, EC2, Glue, Athena, Lambda), Docker

Big Data Technical Expert

CGI
Toulouse
11.2020 - 04.2022

Client : SOCIETE GENERALE, Tech Lead Big Data

Project LUCID (07/2021-04/2022)

  • Tech lead for a dozen developers
  • Conception, Assistance, Devops, PoC
  • Knowledge capitalization, tech talks, conferences...

Project DHR (05/2021-06/2021)

  • Assessment of the migration cost for all HR applications towards the new data lake in the context of a won Request For Proposal

Project RCG (11/2020-05/2021)

  • Development of a banking reconciliation engine on Hortonworks: used by 10 dataflow/projects.
  • Conception and development of complex generic modules in PySpark & SparkSQL + Hive
  • Preparation & execution of production deployments : bash scripting & tuning of application/cluster.
  • Knowledge transfer with the offshore team of Société Générale : preparation & animation of trainings
  • Double-run analysis in production.

Technology Stack :

Hortonworks, Cloudera Data Platform, Spark & PySpark, Hive, HDFS, YARN, Control-M, Cron, Jenkins, Ansible, Java & Scala, SQL, IntelliJ

Big Data Engineer

Maltem
Montreal
12.2018 - 07.2020

Client : SOCIETE GENERALE (Canada) - Tech Lead Big Data

Member of the transverse team (25 peoples), in charge of the data architecture for the Montréal pole.

  • Spreading the data vision : move to cloud, best practices, technology choices, formation,
    communication.
  • Tool developments using Spark, Hive & RESTful APIs (Centralized APIs returning the new cluster configuration created on the go).
  • Change management : creating CI/CD pipelines for project teams.
  • Assisting project teams in their migration to the Azure Cloud.
  • Conception & realization of complex algorithm & developments
  • Production support level 3.

Technology Stack :

Microsoft Azure, Hadoop Ecosystem (Spark, Hive), Java, Scala, SQL, ElasticSearch, Jenkins, Azure Pipelines, Maven, Shell, IntelliJ, Visual Studio Code, Git, Jira

Big Data Engineer

Capgemini
Toulouse
04.2016 - 11.2018

Client : EDF - Data Engineer

  • Work organized in SAFe (Scaled Agile Framework)
  • Development of robust applications in Spark (Java & Scala) within a Hortonworks on-premise ecosystem
  • Automatization of testing and deployments using Ansible playbooks
  • Conception & creation of a new data visualization model using GraphQL and REST APIs

Technology Stack:

Hadoop Ecosystem (Spark, Kafka, Hive, Hbase), Java, Scala, SQL, ElasticSearch, Jenkins, Maven, Shell, IntelliJ, Visual Studio Code, Git, Jira

Education

Master of Science - MIAGE

Universite Toulouse Capitole, Toulouse
09.2013 - 04.2016

Master in MIAGE (information technology applied to business management) (anciently called IGSI)

Skills

Apache Spark

undefined

Timeline

Lead Data Engineer - Lynceus
04.2022 - Current
Big Data Technical Expert - CGI
11.2020 - 04.2022
Big Data Engineer - Maltem
12.2018 - 07.2020
Big Data Engineer - Capgemini
04.2016 - 11.2018
Universite Toulouse Capitole - Master of Science, MIAGE
09.2013 - 04.2016
Martin REMYLead Data Engineer