We are looking for a highly motivated Senior C++ Software Engineer for Data Engines, youâll have the opportunity to work directly on Theseus, the accelerator-native data processing engine built for composability. You will work closely with Voltron Data development teams to build, optimize and maintain our data execution framework, adding new features, making it run faster and more scalable and even contributing to new core architectural components that will enable the engine to run at Petabyte scale.
Why work at Voltron Data?
We are Going for Impact: We are a Series A, venture-backed startup assembling a global team to build a new foundation for data analytics with Apache Arrow. This foundation will usher in a wave of innovation in data processing that can take full advantage of the speed and efficiency offered by modern hardware. We are Committed to Bridging Open Source Communities: We are a collection of open source maintainers who have been driving open source ecosystems over the last 15 years, particularly in the C++, Python, and R programming ecosystems. We are Building a Diverse, Inclusive Company: We are creating a representative, equitable, and respectful workplace that prioritizes employee growth. Everyone at Voltron Data is bought into the companyâs success; all voices are critical to shaping the organizationâs future.
Timeline:
Below is a rough timeline of where you can expect to be at different points during your career path starting in this position.
Upon Joining:
Spending time learning about the Apache Arrow, the compute primitives we use in Theseus, the query parser and optimizer and other foundational components.Diving into the data processing engine architecture, how all the different components interact with each other and how data flows through the compute graph. Understanding memory management mechanics, including spilling memory from GPU, to Host and Disk.Learning and embracing the software development culture at Voltron Data.
Within a month:
Profiling single node and distributed queries executions and analyzing the engine telemetry to better understand how the engine works and how to solve distributed engine issues.Diving deep into the various distributed relational algebra algorithms to understand how they work and how they can be improved.Working with the team on fixing bugs, implementing simple optimizations or code refactoring projects.
Within 6 months:
Building new relational algebra components to expand SQL coverage or DataFrame functionality coverage.Making small improvements to more sophisticated engine components such as resource management, task scheduling, and fault tolerance.
Within 12 months:
Proposing and implementing core architecture improvements to the engine.Working on challenging tasks such as language agnostic user defined functions, multi-query concurrency, and multi-tenancy.Integrating the engine with other components and features developed by other teams in the company to provide enterprise grade customer experiences.
Previous experience that could be helpful:
Experience with data processing engines or frameworksExperience in distributed and multi-threaded systemsExperience in HW resource management including memory and thread pools Working with SQL and non-SQL systems and their computational abstractionsDeveloping in C++, especially using modern C++Developing for multiple types of hardware (i.e. CPU, GPU)
US Compensation – The salary range for this role is between $171,000.00 to $210,000.00. We have a global market-based pay structure which varies by location. Please note that the base pay range is a guideline and for candidates who receive an offer, the exact base pay will vary based on factors such as actual work location, skills and experience of the candidate. This position is also eligible for additional incentives such as equity awards.
#LISM1