Martin REMY

Summary

Experienced and passionate Lead Data Engineer. Over the past 7 years I've worked for major corporate clients, for which I delivered efficient data pipelines implementing the latest best-practices. I love to challenge myself while collaborating with other open-minded persons.

Overview

7

years of professional experience

5

years of post-secondary education

2

Languages

Work History

Lead Data Engineer

Lynceus

Remote

04.2022 - Current

Lynceus is a startup providing predictive AI to specialized industry like semiconductor production. As the lead data engineer, I am fortunate to work on diverse topics and notably:

Data Availability for the internal ML team

Standardized our data acquisition processes (Pydantic, Pandera, AWS S3)
Standardized our all storage means (Apache Parquet, AWS S3)
Implemented a new auto validation feature for data quality checks (Python, Spark, Dagster)
Developed & deployed automated exports jobs to provide always fresh data (Python, Spark, Dagster)
Deployed a data catalog (AWS Glue)

Product Vision

Ownership of all data related topics in the product creation process
Wrote Architectural Decision Record (ADRs) & Product Requirement Document (PRDs)
Wrote 30-60-90 days road maps

Consulting

Developed & deployed data pipelines for our customers (Python, Spark, Dagster, Docker)
Helped our customers investigate their data lifecycle

Jack of all Trades

Created Infrastructure as Code (IaC) scripts for our cloud resources (Terraform)
Developed REST API endpoints (Python, Flask, SQL Alchemy) following a Domain Driven Design

Team management

Defined recruitment process for the data engineering team
Lead interviews assessing both technical performance and human fit

Technology stack:
PySpark, Python, Pandas, Pandera, Pydantic, Pytest, Dagster, SQL ALchemy, Flask, Terraform, AWS (S3, SageMaker, IAM, EMR, EC2, Glue, Athena, Lambda), Docker

Big Data Technical Expert

CGI

Toulouse

11.2020 - 04.2022

Client : SOCIETE GENERALE, Tech Lead Big Data

Project LUCID (07/2021-04/2022)

Tech lead for a dozen developers
Conception, Assistance, Devops, PoC
Knowledge capitalization, tech talks, conferences...

Project DHR (05/2021-06/2021)

Assessment of the migration cost for all HR applications towards the new data lake in the context of a won Request For Proposal

Project RCG (11/2020-05/2021)

Development of a banking reconciliation engine on Hortonworks: used by 10 dataflow/projects.
Conception and development of complex generic modules in PySpark & SparkSQL + Hive
Preparation & execution of production deployments : bash scripting & tuning of application/cluster.
Knowledge transfer with the offshore team of Société Générale : preparation & animation of trainings
Double-run analysis in production.

Technology Stack :

Hortonworks, Cloudera Data Platform, Spark & PySpark, Hive, HDFS, YARN, Control-M, Cron, Jenkins, Ansible, Java & Scala, SQL, IntelliJ

Big Data Engineer

Maltem

Montreal

12.2018 - 07.2020

Client : SOCIETE GENERALE (Canada) - Tech Lead Big Data

Member of the transverse team (25 peoples), in charge of the data architecture for the Montréal pole.

Spreading the data vision : move to cloud, best practices, technology choices, formation,
communication.
Tool developments using Spark, Hive & RESTful APIs (Centralized APIs returning the new cluster configuration created on the go).
Change management : creating CI/CD pipelines for project teams.
Assisting project teams in their migration to the Azure Cloud.
Conception & realization of complex algorithm & developments
Production support level 3.

Technology Stack :

Microsoft Azure, Hadoop Ecosystem (Spark, Hive), Java, Scala, SQL, ElasticSearch, Jenkins, Azure Pipelines, Maven, Shell, IntelliJ, Visual Studio Code, Git, Jira

Big Data Engineer

Capgemini

Toulouse

04.2016 - 11.2018

Client : EDF - Data Engineer

Work organized in SAFe (Scaled Agile Framework)
Development of robust applications in Spark (Java & Scala) within a Hortonworks on-premise ecosystem
Automatization of testing and deployments using Ansible playbooks
Conception & creation of a new data visualization model using GraphQL and REST APIs

Technology Stack:

Hadoop Ecosystem (Spark, Kafka, Hive, Hbase), Java, Scala, SQL, ElasticSearch, Jenkins, Maven, Shell, IntelliJ, Visual Studio Code, Git, Jira

Education

Master of Science - MIAGE

Universite Toulouse Capitole, Toulouse

09.2013 - 04.2016

Master in MIAGE (information technology applied to business management) (anciently called IGSI)

Skills

Apache Spark

Cloud Computing

SQL Proficiency

Scala & Java

Python

Timeline

Lead Data Engineer - Lynceus

04.2022 - Current

Big Data Technical Expert - CGI

11.2020 - 04.2022

Big Data Engineer - Maltem

12.2018 - 07.2020

Big Data Engineer - Capgemini

04.2016 - 11.2018

Universite Toulouse Capitole - Master of Science, MIAGE

09.2013 - 04.2016

Summary

Overview

Work History

Lead Data Engineer

Big Data Technical Expert

Big Data Engineer

Big Data Engineer

Education

Master of Science - MIAGE

Skills

Timeline

Similar Profiles

JAMSHEER MUSLIYARJAMSHEER MUSLIYAR

Midhun PutchakayalaMidhun Putchakayala

Shiva Nanda Reddy ChamalaShiva Nanda Reddy Chamala

Naveen KumarNaveen Kumar

Prabhu SaravananPrabhu Saravanan