Must be able to write quality code and build secure, highly available systems.
Assemble large, complex data sets that meet functional / non-functional business requirements.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc with the guidance.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Monitoring performance and advising any necessary infrastructure changes.
Defining data retention policies.
Implementing the ETL process and optimal data pipeline architecture
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
Create design documents that describe the functionality, capacity, architecture, and process.
Develop, test, and implement data solutions based on finalized design documents.
Work with data and analytics experts to strive for greater functionality in our data systems.
Proactively identify potential production issues and recommend and implement solutions
Good understanding of optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
Proficient understanding of distributed computing principles
Experience in working with batch processing/ real-time systems using various open-source technologies like NoSQL, Spark, Pig, Hive, Apache Airflow.
Implemented complex projects dealing with the considerable data size (PB).
Optimization techniques (performance, scalability, monitoring, etc.)
Experience with integration of data from multiple data sources
Experience with NoSQL databases, such as HBase, Cassandra, MongoDB, etc.,
Knowledge of various ETL techniques and frameworks, such as Flume
Experience with various messaging systems, such as Kafka or RabbitMQ
Good understanding of Lambda Architecture, along with its advantages and drawbacks
Creation of DAGs for data engineering
Expert at Python /Scala programming, especially for data engineering/ ETL purposes
Let us know
Help us maintain the quality of jobs posted on RemoteTechJobs and let us know if:
Senior Data Engineer100% Remote (U.S. based only)A Little Bit About StyleSeat:As a Senior Data Engineer at StyleSeat, you will have a rare opportunity to join a startup empowering small business owners across the country to be more successful doing what they love. Our mission is.
Medidata: Conquering Diseases TogetherTogether Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and
Please, no third parties. Permanent residents only.Although local candidates are preferred, fully remote candidates will be considered.Main Duties & Responsibilities:- Write clean and well-designed Python code.- Play an integral role building a proprietary cloud-based security pl
To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.Job CategoryProducts and TechnologyJob Details***** If you are currently in college/ grad school or have less than 2 years of experie
Responsibilities:Design, implement and maintain highly available data pipelines across different SQL, NoSQL, Cloud and On premise data systemsAutomate repetitive tasks, provide the tools to the developers to become autonomousAutomate to build infrastructureWork closely with devel
Job Description:Data Engineer – Remote, USAt Nutrien, our Purpose is to grow our world from the ground up and we do so with safety and integrity as our core values. Nothing is more important than sending our people home safe, every day.Nutrien Ag Solutions is the retail div