I am working with an exciting organisation who are looking to step up their technology and make major investments and they are hiring a data engineer to come in and take responsibility for the development and maintenance of the Data Lake and ETL jobs to make their processes and operations robust, scalable, and efficient, as well as support self-service initiatives within the business.
Your focus will be on ensuring all business critical data is stored in the data lake, is easibly accessible and available on time. The data is used for reporting, analytics, and other processes.
Whilst moving to the new target infrastructure, you will support the legacy data infrastructure which mainly consists of SQL databases using stored procedures and SSIS packages to move data around.
- Work with new Big Data technologies for data lake
- Maintain legacy data infrastructure
- Work with the business and communicate with people directly to ensure their needs are met
- Provide the business with clear and easy-to-understand data
- Data validation to ensure the quality matches high standards
- Work with the team to provide pragmatic data solutions to business users
- Contribute to tech stack using ideas and trends from the industry
- Review code, architecture, and lead the team in providing the highest quality solutions
Technology stack includes:
- Python for ETL jobs to Hadoop data lake
- For easy access to data in Hadoop, they use Hive
They believe in picking the best tool for the job. The stack consists of a vatiety of technologies that can best solve problems. You will have a voice, in using, reducing, or expanding that stack.
What will you need to apply for this role?
- Curiosity and an intrinsic desire to investigate and understand a problem
- A willingness to try out technologies that you may be unfamiliar with
- Good communication skills
- A good understanding of architecture and code
- Knowledge of big data technologies; for example but not necessarily, Hadoop, Hive, NiFi, SQL, Python