This month - Remote Apache-Spark jobs
  • BHE
    Must be located: United States of America.

    BHE’s mission is to improve healthcare through innovative analytics. Our team has built a next-generation analytics platform, Instant Health Data (IHD), to bring researchers together to generate insights into improving population health, quality of care, and managing costs.  

    Our engineering team is looking for talented individuals who want to bring our platform to the next level, expanding into new markets.  In undertaking this challenge, our new engineers will be at the forefront of the latest technologies, learn about large-scale, new and emerging data sources, and help BHE maintain its leadership position.

    Our engineers work in a fast-paced, rapid-learning environment with leaders in software development, working with massive data sets, and healthcare data analytics.  In our environment, everyone is encouraged to make a difference without experiencing the fixed ways of doing business in larger, bureaucratic organizations.   

    Job Description

    • Design, build, and maintain a highly scalable web analytics platform
    • Ensure that the platform meets business requirements and industry practices for security and privacy
    • Integrate new technologies and software engineering tools into existing platform
    • Mentor other software engineers
    • Provide software architecture support

    Minimum qualifications

    • Bachelor’s degree in Computer Science, Engineering, Math, or related technical/science field
    • 5 years of full-stack development experience
    • 5 years of experience working in a Linux environment

    Preferred qualifications

    • Significant experience with a NoSQL database
    • Significant experience with Apache Spark
    • Significant experience with Python and Java

    Why be a part of BHE's Team?

    • Leading healthcare data analytics/big data company
    • Work on a team of talented and pragmatic engineers/researchers
    • Great mentorship and growth opportunities
  • Doximity
    Must be located: United States of America or North America.

    Doximity is transforming the healthcare industry. Our mission is to help doctors be more productive, informed, and connected. As a software engineer focused on our data stack, you'll work within cross-functional delivery teams alongside other engineers, designers, and product managers in building software to help improve healthcare. 

    Our team brings a diverse set of technical and cultural backgrounds and we like to think pragmatically in choosing the tools most appropriate for the job at hand.  

    About Us

    • We rely heavily on Python, Airflow, Spark, MySQL and Snowflake for most of our data pipelines
    • We have over 350 private repositories in Github containing our pipelines, our own internal multi-functional tools, and open-source projects
    • We have worked as a distributed team for a long time; we're currently about 65% distributed

    Find out more information on the Doximity engineering blog

    Here's How You Will Make an Impact

    • Collaborate with product managers, data analysts, and data scientists to develop pipelines and ETL tasks in order to facilitate the extraction of insights from data.
    • Build, maintain, and scale data pipelines that empower Doximity’s products.
    • Establish data architecture processes and practices that can be scheduled, automated, replicated and serve as standards for other teams to leverage.
    • Spearhead, plan, and carry out the implementation of solutions while self-managing.

    About you

    • You have at least three years of professional experience developing data processing, enrichment, transformation, and integration solutions
    • You are fluent in Python, an expert in SQL, and can script your way around Linux systems with bash
    • You are no stranger to data warehousing and designing data models
    • Bonus: You have experience building data pipelines with Apache Spark in a multi-database ecosystem
    • You are foremost an engineer, making you passionate for high code quality, automated testing, and other engineering best practices
    • You have the ability to self-manage, prioritize, and deliver functional solutions
    • You possess advanced knowledge of Unix, Git, and AWS tooling
    • You agree that concise and effective written and verbal communication is a must for a successful team
    • You are able to maintain a minimum of 5 hours overlap with 9:30 to 5:30 PM Pacific time
    • You can dedicate about 18 days per year for travel to company events

    Benefits

    Doximity has industry leading benefits. For an updated list, see our career page More info on Doximity

    We’re thrilled to be named the Fastest Growing Company in the Bay Area, and one of Fast Company’s Most Innovative Companies. Joining Doximity means being part of an incredibly talented and humble team. We work on amazing products that over 70% of US doctors (and over one million healthcare professionals) use to make their busy lives a little easier. We’re driven by the goal of improving inefficiencies in our $3.5 trillion U.S. healthcare system and love creating technology that has a real, meaningful impact on people’s lives. To learn more about our team, culture, and users, check out our careers page, company blog, and engineering blog. We’re growing steadily, and there’s plenty of opportunities for you to make an impact.

    Doximity is proud to be an equal opportunity employer, and committed to providing employment opportunities regardless of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, pregnancy, childbirth and breastfeeding, age, sexual orientation, military or veteran status, or any other protected classification. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law.

    Location

    • US-only
  • Nagarro
    PROBABLY NO LONGER AVAILABLE.Preferred timezone:

    Required experience and skills: 

    • Expertise in Java or Scala
    • Familiarity with cluster computing technologies such as Apache Spark or Hadoop MapReduce
    • Familiarity with relational and big data such as Postgres, HDFS, Apache Kudu and similar technologies
    • Strong skills in analytic computing and algorithms
    • Strong mathematical background, including statistics and numerical analysis
    • Knowledge of advanced programming concepts such as memory management, files & handles, multi-threading and operating systems.
    • Passion for finding and solving problems
    • Excellent communication skills, proven ability to convey complex ideas to others in a concise and clear manner 

    Desirable experience and skills: 

    • Familiarity with scripting languages such as Python or R
    • Experience in performance measurement, bottleneck analysis, and resource usage monitoring
    • Familiarity with probabilistic and stochastic computational techniques
    • Experience with data access and computing in highly distributed cloud systems
    • Prior history with agile development
  • Nagarro
    PROBABLY NO LONGER AVAILABLE.Preferred timezone:

    Required experience and skills: 

    • Expertise in Java or Scala

    • Familiarity with cluster computing technologies such as Apache Spark or Hadoop MapReduce

    • Familiarity with relational and big data such as Postgres, HDFS, Apache Kudu and similar technologies

    • Strong skills in analytic computing and algorithms

    • Strong mathematical background, including statistics and numerical analysis

    • Knowledge of advanced programming concepts such as memory management, files & handles, multi-threading and operating systems.

    • Passion for finding and solving problems

    • Excellent communication skills, proven ability to convey complex ideas to others in a concise and clear manner 

    Desirable experience and skills: 

    • Familiarity with scripting languages such as Python or R

    • Experience in performance measurement, bottleneck analysis, and resource usage monitoring

    • Familiarity with probabilistic and stochastic computational techniques

    • Experience with data access and computing in highly distributed cloud systems

    • Prior history with agile development

Older - Remote Apache-Spark jobs
  • phData
    PROBABLY NO LONGER AVAILABLE.Preferred timezone: UTC -6

    Are you inspired by innovation, hard work and a passion for data?    

    If so, this may be the ideal opportunity to leverage your Software Engineering, Data Engineering or Data Analytics experience to design, develop and innovate big data solutions for a diverse set of clients.  

    At phData, our proven success has skyrocketed the demand for our services, resulting in quality growth and an expanded presence at our company headquarters conveniently located in Downtown Minneapolis (Fueled Collective).

    As the world’s largest pure-play Big Data, Machine Learning and Data Science services firm, our team includes Apache committers, Machine Learning experts and the most knowledgeable Scala development team in the industry. phData has earned the trust of customers by demonstrating our mastery of Big Data and Machine Learning services and our commitment to excellence.

    In addition to a phenomenal growth and learning opportunity, we offer competitive compensation and excellent perks including base salary, annual bonus, extensive training, paid Cloudera certifications - in addition to generous PTO and employee equity.

    As a Machine Learning Engineer, your responsibilities include:

    • Convert proof of concepts to production-grade solutions that can scale for hundreds of thousands of users

    • Create and manage machine learning pipelines on a Hadoop cluster to support any kind of model deployment on streaming or batch data.

    • Tackle challenging problems, such as developing web services and ETL pipeline components, to productize and evaluate machine learning models

    • Write production code and collaborate with Solutions Architects and Data Scientists to implement algorithms in production

    • Design, conduct, and analyze experiments to validate proposed ML modeling approaches as well as improvements to existing ML pipelines

    Qualifications

    • Previous experience as a Software Engineer, Data Engineer or Data Scientist (with hands-on engineering experience)
    • Solid programming experience in Python, Java, Scala, or other statically typed programming language
    • Hands-on experience in one or more big data ecosystem products/languages such as Spark, Impala, Solr, Kudu, etc
    • Experience working with Data Science/Machine Learning software and libraries such as h2o, TensorFlow, Keras, scikit-learn, etc.
    • Strong working knowledge of SQL and the ability to write, debug, and optimize distributed SQL queries
    • Excellent communication skills; previous experience working with internal or external customers
    • Strong analytical abilities; ability to translate business requirements and use cases into a Hadoop solution, including ingestion of many data sources, ETL processing, data access, and consumption, as well as custom analytics
    • 4 year Bachelor's Degree in Computer Science or a related field, or equivalent years of professional working experience.

    Keywords: Hive, Apache Spark, Java, Apache Kafka, Big Data, Spark, Solution Architecture, Cloudera, Apache Pig, Hadoop, NoSQL, Cloudera Impala, Scala, Python, Data Engineering, Big Data Analytics, Large Scale Data Analysis, ETL, Linux, Kudu, Pandas, TensorFlow, h2o, R, Keras, PyTorch, scikit-learn, Machine Learning, Machine Learning Engineering, Data Science, PySpark, NLP

  • phData
    PROBABLY NO LONGER AVAILABLE.Preferred timezone: UTC -6

    Are you inspired by innovation, hard work and a passion for data?    

    If so, this may be the ideal opportunity to leverage your Software Engineering, Data Engineering or Data Analytics experience to design, develop and innovate big data solutions for a diverse set of clients.  

    At phData, our proven success has skyrocketed the demand for our services, resulting in quality growth and an expanded presence at our company headquarters conveniently located in Downtown Minneapolis (Fueled Collective).

    As the world’s largest pure-play Big Data, Machine Learning and Data Science services firm, our team includes Apache committers, Machine Learning experts and the most knowledgeable Scala development team in the industry. phData has earned the trust of customers by demonstrating our mastery of Big Data and Machine Learning services and our commitment to excellence.

    In addition to a phenomenal growth and learning opportunity, we offer competitive compensation and excellent perks including base salary, annual bonus, extensive training, paid Cloudera certifications - in addition to generous PTO and employee equity.

    As a Machine Learning Engineer, your responsibilities include:

    • Convert proof of concepts to production-grade solutions that can scale for hundreds of thousands of users

    • Create and manage machine learning pipelines on a Hadoop cluster to support any kind of model deployment on streaming or batch data.

    • Tackle challenging problems, such as developing web services and ETL pipeline components, to productize and evaluate machine learning models

    • Write production code and collaborate with Solutions Architects and Data Scientists to implement algorithms in production

    • Design, conduct, and analyze experiments to validate proposed ML modeling approaches as well as improvements to existing ML pipelines

    Qualifications

    • Previous experience as a Software Engineer, Data Engineer or Data Scientist (with hands-on engineering experience)
    • Solid programming experience in Python, Java, Scala, or other statically typed programming language
    • Hands-on experience in one or more big data ecosystem products/languages such as Spark, Impala, Solr, Kudu, etc
    • Experience working with Data Science/Machine Learning software and libraries such as h2o, TensorFlow, Keras, scikit-learn, etc.
    • Strong working knowledge of SQL and the ability to write, debug, and optimize distributed SQL queries
    • Excellent communication skills; previous experience working with internal or external customers
    • Strong analytical abilities; ability to translate business requirements and use cases into a Hadoop solution, including ingestion of many data sources, ETL processing, data access, and consumption, as well as custom analytics
    • 4 year Bachelor's Degree in Computer Science or a related field, or equivalent years of professional working experience.

    Keywords: Hive, Apache Spark, Java, Apache Kafka, Big Data, Spark, Solution Architecture, Cloudera, Apache Pig, Hadoop, NoSQL, Cloudera Impala, Scala, Python, Data Engineering, Big Data Analytics, Large Scale Data Analysis, ETL, Linux, Kudu, Pandas, TensorFlow, h2o, R, Keras, PyTorch, scikit-learn, Machine Learning, Machine Learning Engineering, Data Science, PySpark, NLP

  • Sonatype
    PROBABLY NO LONGER AVAILABLE.Preferred timezone: UTC -5

    Sonatype’s mission is to enable organizations to better manage their software supply chain.  We offer a series of products and services including the Nexus Repository Manager and Nexus Lifecycle Manager. We are a remote and talented product development group, and we work in small autonomous teams to create high-quality products. Thousands of organizations and millions of developers use our software. If you have a passion for challenging problems, software craftsmanship and having an impact, then Sonatype is the right place for you. We are expanding our Data team, responsible for unlocking insight from vast amounts of software component data, powering our suite of products enabling our customers from making informed and automated decisions in managing their software supply chain. As a Backend Engineer, you will lead or contribute to designing, development, and monitoring of systems and solutions for collecting, storing, processing, and analyzing large data sets.  You will work in a team made up of Data Scientists and other Software Engineers. No one is going to tell you when to get up in the morning, or dole out a bunch of small tasks for you to do every single day. Members of Sonatype's Product organization have the internal drive and initiative to make the product vision a reality. Flow should be the predominate state of mind.

    Requirements:

    • Deep software engineering experience; we primarily use Java.
    • Database and data manipulation skills working with relational or non-relational models.
    • Strong ability to select and integrate appropriate tools, frameworks, systems to build great solutions.
    • Deep curiosity for how things work and desire to make them better.
    • Legally authorized to work (without sponsorship) in Canada, Colombia, or the United States of America and are currently residing in the corresponding country.

    Nice To Haves:

    • Degree in Computer Science, Engineering, or another quantitative field.
    • Knowledge and experience with non-relational databases (i.e., HBase, MongoDB, Cassandra).
    • Knowledge and experience with large-scale data tools and techniques (i.e., MapReduce, Hadoop, Hive, Spark).
    • Knowledge and experience with AWS Big Data services (i.e., EMR, ElasticSearch).
    • Experience working in a highly distributed environment, using modern collaboration tools to facilitate team communication.
  • source{d}
    PROBABLY NO LONGER AVAILABLE.€49,000.00 - €53,000.00.Preferred timezone: UTC -20 to UTC +4

     At source{d} we are building the technology stack for the next generation of Machine Learning powered developer tools. We are an open-core company built around our Open Source projects.

    We have raised over ten million USD so far, and we are currently growing our team.

    This is a remote position however can also be based from our Madrid office.

    Role:

    The Data Retrieval team is developing source{d}'s high-level code analysis applications for running scalable data retrieval pipelines that process and manipulate any number of code repositories for source code analysis. Written mostly in Go, it aims to be robust, friendly and flexible and capable of running on large-scale distributed clusters over petabytes of data.

    We at source{d} seek to be at the heart of any project related to source code. Thus, this core tool will be used both in-house for building source{d}'s unique global scale open dataset of +60M code repositories for cutting-edge Machine Learning research, as well as used externally by empowering a wide community of developers, researchers and companies worldwide when doing vanguard research or building the next generation of developer tools and experiences.

    • Good knowledge of distributed computing and parallel processing is important.
    • You will be expected to have strong backend coding skills in at least two languages and very good algorithmic skills.
    • Scala coding skills and knowledge about Apache Spark aren't required but will be highly appreciated, on the other hand Go is not a strict requirement; we strongly believe that it can be learned easily by any skilled developer and care a lot more about our team's mindset and prior experience than any specific skills.

    Culture

    • source{d} is a company for developers by developers. We firmly believe in always doing what's best for the individual developer in the community. Our team consists of members who are passionate about programming. To understand our culture better, read more about it here
    • At the moment, we are 35+ people from 10 different countries working closely together from our office in Madrid. We are more than happy to sponsor you a visa and guide you and your family through the whole process if you decide to come to work from our office, but you may also choose to work remotely. Currently, we have remote team members in USA, Portugal, Ireland, France, Belgium, Poland, Estonia and Russia. 
    • For those wanting to work from one of our offices, we fully support the visa and moving process for you and your family. 
    • At source{d}, we have a transparent salary policy which we feel strongly about it. Your seniority level will be determined during the last round of on-site interviews. 
    • At source{d} all of the projects we work on are public on GitHub and the vast majority are open-source under licenses such as Apache 2.0 or GPL3.
    • We don't just believe in open-source, we also believe in radical transparency as an organization, there we publish everything about the company at github.com/src-d/guide.

    Perks

    • We go to conferences and other developer events!
    • Open Source Days, every second Monday, you are encouraged to work on any OSS project you choose.
    • Flexible hours, set your own schedule that fits you.
    • Free books. We will buy any books that help you learn & grow.
    • If you choose to work from one of our offices, you will enjoy a comfortable and spacious environment.
    • Annual summer and winter Christmas parties and a hackathon retreat are held in Madrid and all team members are flown over for it.
    • We also have our own, Open Source craft beers.

    Other

    • We offer visa and relocation support for those wanting to work in the Madrid office.
    • The local timezone of developers who want to work remotely should be between San Francisco and Moscow.
  • phData
    PROBABLY NO LONGER AVAILABLE.

    If you're inspired by innovation, hard work and a passion for data, this may be the ideal opportunity to leverage your background in Big Data and Software Engineering, Data Engineering or Data Analytics experience to design, develop and innovate big data solutions for a diverse set of global and enterprise clients.  

    At phData, our proven success has skyrocketed the demand for our services, resulting in quality growth at our company headquarters conveniently located in Downtown Minneapolis and expanding throughout the US. Notably we've also been voted Best Company to Work For in Minneapolis for the last 2 years.   

    As the world’s largest pure-play Big Data services firm, our team includes Apache committers, Spark experts and the most knowledgeable Scala development team in the industry. phData has earned the trust of customers by demonstrating our mastery of Hadoop services and our commitment to excellence.

    In addition to a phenomenal growth and learning opportunity, we offer competitive compensation and excellent perks including base salary, annual bonus, extensive training, paid Cloudera certifications - in addition to generous PTO and employee equity. 

    As a Solution Architect on our Big Data Consulting Team, your responsibilities will include:

    • Design, develop, and innovative Hadoop solutions; partner with our internal Infrastructure Architects and Data Engineers to build creative solutions to tough big data problems.  

    • Determine the technical project road map, select the best tools, assign tasks and priorities, and assume general project management oversight for performance, data integration, ecosystem integration, and security of big data solutions.  Mentor and coach Developers and Data Engineers. Provide guidance with project creation, application structure, automation, code style, testing, and code reviews

    • Work across a broad range of technologies – from infrastructure to applications – to ensure the ideal Hadoop solution is implemented and optimized

    • Integrate data from a variety of data sources (data warehouse, data marts) utilizing on-prem or cloud-based data structures (AWS); determine new and existing data sources

    • Design and implement streaming, data lake, and analytics big data solutions

    • Create and direct testing strategies including unit, integration, and full end-to-end tests of data pipelines

    • Select the right storage solution for a project - comparing Kudu, HBase, HDFS, and relational databases based on their strengths

    • Utilize ETL processes to build data repositories; integrate data into Hadoop data lake using Sqoop (batch ingest), Kafka (streaming), Spark, Hive or Impala (transformation)

    • Partner with our Managed Services team to design and install on prem or cloud based infrastructure including networking, virtual machines, containers, and software

    • Determine and select best tools to ensure optimized data performance; perform Data Analysis utilizing Spark, Hive, and Impala

    • Local Candidates work between client site and office (Minneapolis).  Remote US must be willing to travel 20% for training and project kick-off.

    Technical Leadership Qualifications

    • 5+ years previous experience as a Software Engineer, Data Engineer or Data Analytics

    • Expertise in core Hadoop technologies including HDFS, Hive and YARN.  

    • Deep experience in one or more ecosystem products/languages such as HBase, Spark, Impala, Solr, Kudu, etc

    • Expert programming experience in Java, Scala, or other statically typed programming language

    • Ability to learn new technologies in a quickly changing field

    • Strong working knowledge of SQL and the ability to write, debug, and optimize distributed SQL queries

    • Excellent communication skills including proven experience working with key stakeholders and customers

    Leadership

    • Ability to translate “big picture” business requirements and use cases into a Hadoop solution, including ingestion of many data sources, ETL processing, data access and consumption, as well as custom analytics

    • Experience scoping activities on large scale, complex technology infrastructure projects

    • Customer relationship management including project escalations, and participating in executive steering meetings

    • Coaching and mentoring data or software engineers

  • Sonatype
    PROBABLY NO LONGER AVAILABLE.$30,000.00 - $50,000.00.Preferred timezone: UTC -5

    Sonatype’s mission is to enable organizations to better manage their software supply chain.  We offer a series of products and services including the Nexus Repository Manager and Nexus Lifecycle Manager. We are a remote and talented product development group, and we work in small autonomous teams to create high-quality products. Thousands of organizations and millions of developers use our software. If you have a passion for challenging problems, software craftsmanship and having an impact, then Sonatype is the right place for you. We are expanding our Data team, responsible for unlocking insight from vast amounts of software component data, powering our suite of products enabling our customers from making informed and automated decisions in managing their software supply chain. As a Backend Engineer, you will lead or contribute to designing, development, and monitoring of systems and solutions for collecting, storing, processing, and analyzing large data sets.  You will work in a team made up of Data Scientists and other Software Engineers. No one is going to tell you when to get up in the morning, or dole out a bunch of small tasks for you to do every single day. Members of Sonatype's Product organization have the internal drive and initiative to make the product vision a reality. Flow should be the predominate state of mind.

    Requirements:

    • Deep software engineering experience; we primarily use Java.
    • Database and data manipulation skills working with relational or non-relational models.
    • Strong ability to select and integrate appropriate tools, frameworks, systems to build great solutions.
    • Deep curiosity for how things work and desire to make them better.
    • Legally authorized to work (without sponsorship) in Canada, Colombia, or the United States of America and are currently residing in the corresponding country.

    Nice To Haves:

    • Degree in Computer Science, Engineering, or another quantitative field.
    • Knowledge and experience with non-relational databases (i.e., HBase, MongoDB, Cassandra).
    • Knowledge and experience with large-scale data tools and techniques (i.e., MapReduce, Hadoop, Hive, Spark).
    • Knowledge and experience with AWS Big Data services (i.e., EMR, ElasticSearch).
    • Experience working in a highly distributed environment, using modern collaboration tools to facilitate team communication.
  • Knock.com
    PROBABLY NO LONGER AVAILABLE.$130,000.00 - $160,000.00.Preferred timezone: UTC -8 to UTC -4

    Our homes are our most valuable asset and also the most difficult to buy and sell. Knock is on a mission to make trading in your house as simple and certain as trading in your car. Started by founding team members of Trulia.com (NYSE: TRLA, acquired by Zillow for $3.5B), Knock is an online home trade-in platform that uses data science to price homes accurately, technology to sell them quickly and a dedicated team of professionals to guide you every step of the way. We share the same top-tier investors as iconic brands like Netflix, Tivo, Match, HomeAway and Houzz.

    We are seeking an experienced Cloud Infrastructure Engineer to help us design, build and monitor our AWS cloud Infrastructure. You will provide infrastructure architectural direction and implementation to support engineering efforts for both Knock’s internal and customer-facing products. We are looking for someone who is passionate about creating great products to help millions of home buyers and sellers buy or sell a home without risk, stress, and uncertainty.

    Responsibilities:

    • Work with engineering teams to understand infrastructure requirements, provide insight and direction to achieve a balance between strategic design and tactical needs.
    • Setup and maintain dev/staging/production environments in AWS using Terraform configurations (infrastructure as code).
    • Establish and ensure infrastructure standards (security, reliability, availability, and scalability) and procedures are met by our engineering teams.
    • Design and implement AWS cloud infrastructure tools.
    • Design and implement proactive monitoring solutions to ensure service SLA’s and other metrics are met.
    • Identify opportunities for infrastructure optimization and cost reduction.
    • Some on-call required.

    Requirements:

    • Must be U.S. based.
    • Minimum of 5 years of experience in a DevOps or infrastructure architect role.
    • Minimum of 2 years of full lifecycle software development experience including coding, testing, troubleshooting, and deployment.
    • Experience in running and maintaining automated production infrastructure in the AWS cloud.
    • Strong knowledge of Linux OS (CentOS, RedHat, Ubuntu) and bash/shell scripting.
    • Strong knowledge of AWS cloud products, security, and networking, including VPCs/ACLs/subnets/NAT/VPN, IAM, ELB/ALB, Route53, etc.
    • Experience in infrastructure provisioning products, Terraform (preferred) or CloudFormation.
    • Experience running container management systems (ECS, kubernetes, Mesosphere) in production.
    • Programming proficiency in Go or Python.

    Bonus points for knowledge of:

    • Node.Js
    • Apache Spark
    • ElasticSearch

    What we can offer you:

    • An amazing opportunity to be an integral part of building the next multi-billion dollar consumer brand around the single largest purchase of our lives.
    • Talented, passionate and mission-driven peers disrupting the status quo.
    • Competitive cash, full medical, dental, vision benefits, 401k, flexible work schedule, unlimited vacation (2 weeks mandatory) and sick time.
    • Flexibility to live and work anywhere within the United States. As we are a distributed company and engineering team, we are open to any U.S. location for this role.

    We have offices in New York, San Francisco, Atlanta, Charlotte, Raleigh, Dallas-Fort Worth, Phoenix, and Denver with more on the way. In fact, we are proud to be a distributed company with employees in 18 different states. This is an amazing opportunity to be an integral part of building a multi-billion dollar consumer brand in an industry that is long overdue for a new way of doing things. You will be working with a passionate, mission-driven team that is disrupting the status quo. Knock is an Equal Opportunity Employer. Individuals seeking employment at Knock are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, or sexual orientation. This role is not eligible for visa sponsorship. Please no recruitment firm or agency inquiries, you will not receive a reply from us.

  • APEX Expert Solutions
    PROBABLY NO LONGER AVAILABLE.$110,000.00 - $140,000.00.Preferred timezone: UTC -9 to UTC -1

    Duties and Responsibilities

    • Work in an fast-paced agile development environment architecting and developing Hadoop applications
    • Provide technology recommendations for potential product application development
    • Gather and analyze requirements from product owners ensuring products meet business requirements
    • Collaborate with other software engineers and team leads in designing and developing software solutions which meet high quality standards
    • Quickly prototype and develop Python/Java/Scala applications in diverse operating environments capable of interfacing with NoSQL datastores such as Accumulo and HBase
    • Write efficient code to extract, transform, load, and query very large datasets to include both structured and unstructured datasets
    • Develop standards and new design patterns for Big Data applications and master the tools and technology components within the Hadoop and Cloudera environments
    • Design and implement REST API applications provide web application connectivity to backend datastores

    Skills & Requirements

    • 3 years of building Java applications including framework experience (J2EE, Spring, etc.)
    • 1 year of building and coding applications using Hadoop components – HDFS, HBase, Hive, Sqoop, Flume, Spark, etc
    • 3 years experience with Spark
    • 1 year of experience with GeoMesa
    • 1 year of experience with SparkSQL
    • Experience building and maintaining Cloudera-based clusters
    • Experience using traditional ETL tools & RDBMS
    • Experience developing REST web services
    • Demonstrated effective and successful verbal and written communication skills
    • Bachelor degree in Computer Science or related technological degree
    • U.S. citizen

    Desired Qualifications

    • Full life cycle software application development experience
    • Front end web development with experience in JQuery, Polymer, web components, Bootstrap, Node.js, etc
    • Demonstrated ability to quickly learn and apply new technologies
    • Experience with unstructured datasets, such as log files, email, text
    • Experience with geospatial datasets and datastores
  • Sonatype
    PROBABLY NO LONGER AVAILABLE.

    Sonatype’s mission is to enable organizations to better manage their software supply chain.  We offer a series of products and services including the Nexus Repository Manager and Nexus Lifecycle Manager. We are a remote and talented product development group, and we work in small autonomous teams to create high-quality products. Thousands of organizations and millions of developers use our software. If you have a passion for challenging problems, software craftsmanship and having an impact, then Sonatype is the right place for you. We are expanding our Data team, responsible for unlocking insight from vast amounts of software component data, powering our suite of products enabling our customers from making informed and automated decisions in managing their software supply chain. As a Backend Engineer, you will lead or contribute to designing, development, and monitoring of systems and solutions for collecting, storing, processing, and analyzing large data sets.  You will work in a team made up of Data Scientists and other Software Engineers. No one is going to tell you when to get up in the morning, or dole out a bunch of small tasks for you to do every single day. Members of Sonatype's Product organization have the internal drive and initiative to make the product vision a reality. Flow should be the predominate state of mind.

    Requirements:

    • Deep software engineering experience; we primarily use Java.
    • Database and data manipulation skills working with relational or non-relational models.
    • Strong ability to select and integrate appropriate tools, frameworks, systems to build great solutions.
    • Deep curiosity for how things work and desire to make them better.
    • Legally authorized to work (without sponsorship) in Canada, Colombia, or the United States of America and are currently residing in the corresponding country.

    Nice To Haves:

    • Degree in Computer Science, Engineering, or another quantitative field.
    • Knowledge and experience with non-relational databases (i.e., HBase, MongoDB, Cassandra).
    • Knowledge and experience with large-scale data tools and techniques (i.e., MapReduce, Hadoop, Hive, Spark).
    • Knowledge and experience with AWS Big Data services (i.e., EMR, ElasticSearch).
    • Experience working in a highly distributed environment, using modern collaboration tools to facilitate team communication.
  • amplified ai
    PROBABLY NO LONGER AVAILABLE.$80,000.00 - $120,000.00.Preferred timezone: UTC +11

    Data is the foundation of everything we do. As a senior data engineer you’ll have authority (and responsibility) of designing and maintaining infrastructure for collecting, storing, processing, and analyzing terabyte-scale sets of data including large document corpora, machine learning results, metadata, and application data.

    We are looking for a Data Engineer to collect, manage, and deploy massive sets of global patent data and more. Your role will be to ensure that our dataset of over 100 million patents is readily available for our web application and data science teams. You’ll also be responsible for identifying and integrating additional data sets which allow us to expand our product features and AI capabilities.

    This is an exciting opportunity to engage with cutting-edge technology and work on a real-world problem at global scale. In addition to competitive compensation and benefits there is also room for the right person to take on increased responsibilities. And it’s a lot of fun (although fast-paced and even chaotic at times) working as part of a small, passionate team.

    Responsibilities:

    • Take ownership of understanding, acquiring, and managing innovation and technology related datasets starting with global patents
    • Write and automate pipelines for data cleansing, ingestion of machine learning results, ingestion of raw data from multiple sources, aggregation and more
    • Architect and manage data infrastructure to optimize for machine learning, large-scale data exploration
    • Ensure fast and reliable access to clean data which our client-facing web application depends on
    • Seek and integrate new sources of data related to our core business
    • Communicate data extent and performance to internal consumers

    Minimum Qualifications and Education Requirements:

    • BSc/BEng degree in computer science or equivalent
    • Strong relational database experience, preferably with Postgres
    • The ability to communicate high level information about datasets, preferably using data visualization
    • Experience writing performant data pipelines at scale, e.g. with Spark or Airflow
    • The ability to use a modern language with a strong concurrency model for fast data processing such as Elixir, Rust or Go

    Preferred Qualifications:

    • MSc/MEng degree in computer science or equivalent
    • Passion for AI and excitement about new developments
    • Contributions to open source projects
    • Experience with machine learning
    • Experience with data visualisation
  • SecurityTrails
    PROBABLY NO LONGER AVAILABLE.

    We are looking for a Lead Data Scientist to build a technical team and help us gain useful insight out of raw data as well as automate the creation and retrieval of the data.

    Your ultimate goal will be to help improve our products and business decisions by making the most out of our data, finding creative ways to improve and obtain new data, and helping to build out our incredible data team.

    Your responsibilities:

    •    Manage a team of data scientists, machine learning engineers and    big data specialists

    •    Lead data mining and collection procedures

    •    Ensure data quality and integrity

    •    Interpret and analyze data problems

    •    Conceive, plan and prioritize data projects

    •    Build analytic systems

    •    Visualize data and create reports

    •    Experiment with new models and techniques

    •    Align data projects with organizational goals

    You are skilled in:

    •    Apache Kafka,

    •    Apache Spark,

    •    BigQuery,

    •    Elasticsearch

    •    or similar technologies

    You should have a strong problem-solving ability and a knack for statistical analysis. If you are also able to align our data products with our business goals, we would like to meet you.

    Your benefits:

    •    working full-time remotely

    •    trips to team meet ups are paid

    •    teammates from countries all over the world

    For further information about our remote work culture visit us on https://securitytrails.com/blog/working-remotely

    Our mission

    SecurityTrails strives to make the biggest treasure-trove of cyber intelligence data readily available in an instant. We work relentlessly to empower experts so they can thwart future attacks with up-to-date data, proprietary tools, and custom solutions.

    A Security Beast Built Bit By Bit

    We started because we were tired. Tired of combing through domain lists and forensic data manually, tired of searching through numerous sites for all the data we needed. We patiently waited for the perfect tool, but it never came. Our solution had to be vast, fast, and able to update daily — so we assembled a talented team and built it from scratch.

    SecurityTrails was founded in June 2017, and from the very start it was decided that it would be a fully remote team. What began as a team of three people based in the US has grown into a team that currently counts 19 individuals. And that’s not even taking into account the number of contractors we work with, living and working across the entire globe.

  • phData
    PROBABLY NO LONGER AVAILABLE.

    If you're inspired by innovation, hard work and a passion for data, this may be the ideal opportunity to leverage your background in Big Data and Software Engineering, Data Engineering or Data Analytics experience to design, develop and innovate big data solutions for a diverse set of global and enterprise clients.  

    At phData, our proven success has skyrocketed the demand for our services, resulting in quality growth at our company headquarters conveniently located in Downtown Minneapolis and expanding throughout the US. Notably we've also been voted Best Company to Work For in Minneapolis for the last 2 years.   

    As the world’s largest pure-play Big Data services firm, our team includes Apache committers, Spark experts and the most knowledgeable Scala development team in the industry. phData has earned the trust of customers by demonstrating our mastery of Hadoop services and our commitment to excellence.

    In addition to a phenomenal growth and learning opportunity, we offer competitive compensation and excellent perks including base salary, annual bonus, extensive training, paid Cloudera certifications - in addition to generous PTO and employee equity. 

    As a Solution Architect on our Big Data Consulting Team, your responsibilities will include:

    • Design, develop, and innovative Hadoop solutions; partner with our internal Infrastructure Architects and Data Engineers to build creative solutions to tough big data problems.  

    • Determine the technical project road map, select the best tools, assign tasks and priorities, and assume general project management oversight for performance, data integration, ecosystem integration, and security of big data solutions.  Mentor and coach Developers and Data Engineers. Provide guidance with project creation, application structure, automation, code style, testing, and code reviews

    • Work across a broad range of technologies – from infrastructure to applications – to ensure the ideal Hadoop solution is implemented and optimized

    • Integrate data from a variety of data sources (data warehouse, data marts) utilizing on-prem or cloud-based data structures (AWS); determine new and existing data sources

    • Design and implement streaming, data lake, and analytics big data solutions

    • Create and direct testing strategies including unit, integration, and full end-to-end tests of data pipelines

    • Select the right storage solution for a project - comparing Kudu, HBase, HDFS, and relational databases based on their strengths

    • Utilize ETL processes to build data repositories; integrate data into Hadoop data lake using Sqoop (batch ingest), Kafka (streaming), Spark, Hive or Impala (transformation)

    • Partner with our Managed Services team to design and install on prem or cloud based infrastructure including networking, virtual machines, containers, and software

    • Determine and select best tools to ensure optimized data performance; perform Data Analysis utilizing Spark, Hive, and Impala

    • Local Candidates work between client site and office (Minneapolis).  Remote US must be willing to travel 20% for training and project kick-off.

    Technical Leadership Qualifications

    • 5+ years previous experience as a Software Engineer, Data Engineer or Data Analytics

    • Expertise in core Hadoop technologies including HDFS, Hive and YARN.  

    • Deep experience in one or more ecosystem products/languages such as HBase, Spark, Impala, Solr, Kudu, etc

    • Expert programming experience in Java, Scala, or other statically typed programming language

    • Ability to learn new technologies in a quickly changing field

    • Strong working knowledge of SQL and the ability to write, debug, and optimize distributed SQL queries

    • Excellent communication skills including proven experience working with key stakeholders and customers

    Leadership

    • Ability to translate “big picture” business requirements and use cases into a Hadoop solution, including ingestion of many data sources, ETL processing, data access and consumption, as well as custom analytics

    • Experience scoping activities on large scale, complex technology infrastructure projects

    • Customer relationship management including project escalations, and participating in executive steering meetings

    • Coaching and mentoring data or software engineers

  • Doximity
    PROBABLY NO LONGER AVAILABLE.Must be located: North America.

    Doximity is transforming the healthcare industry. Our mission is to help doctors be more productive, informed, and connected. As a software engineer focused on our data stack, you'll work within cross-functional delivery teams alongside other engineers, designers, and product managers in building software to help improve healthcare. 

    Our team brings a diverse set of technical and cultural backgrounds and we like to think pragmatically in choosing the tools most appropriate for the job at hand.  

    About Us

    Here's How You Will Make an Impact

    • Collaborate with product managers, data analysts, and data scientists to develop pipelines and ETL tasks in order to facilitate the extraction of insights from data.
    • Build, maintain, and scale data pipelines that empower Doximity’s products.
    • Establish data architecture processes and practices that can be scheduled, automated, replicated and serve as standards for other teams to leverage.
    • Spearhead, plan, and carry out the implementation of solutions while self-managing.

    About you

    • You have at least three years of professional experience developing data processing, enrichment, transformation, and integration solutions
    • You are fluent in Python, an expert in SQL, and can script your way around Linux systems with bash
    • You are no stranger to data warehousing and designing data models
    • Bonus: You have experience building data pipelines with Apache Spark in a multi-database ecosystem
    • You are foremost an engineer, making you passionate for high code quality, automated testing, and other engineering best practices
    • You have the ability to self-manage, prioritize, and deliver functional solutions
    • You possess advanced knowledge of Unix, Git, and AWS tooling
    • You agree that concise and effective written and verbal communication is a must for a successful team
    • You are able to maintain a minimum of 5 hours overlap with 9:30 to 5:30 PM Pacific time
    • You can dedicate about 18 days per year for travel to company events

    Benefits

    • Doximity has industry leading benefits. For an updated list, see our career page

    More info on Doximity

    We’re thrilled to be named the Fastest Growing Company in the Bay Area, and one of Fast Company’s Most Innovative Companies. Joining Doximity means being part of an incredibly talented and humble team. We work on amazing products that over 70% of US doctors (and over one million healthcare professionals) use to make their busy lives a little easier. We’re driven by the goal of improving inefficiencies in our $3.5 trillion U.S. healthcare system and love creating technology that has a real, meaningful impact on people’s lives. To learn more about our team, culture, and users, check out our careers page, company blog, and engineering blog. We’re growing steadily, and there’s plenty of opportunities for you to make an impact.

    Doximity is proud to be an equal opportunity employer, and committed to providing employment opportunities regardless of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, pregnancy, childbirth and breastfeeding, age, sexual orientation, military or veteran status, or any other protected classification. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. 

  • Signifyd
    PROBABLY NO LONGER AVAILABLE.

    At Signifyd we’re creating a new market. We’re constantly improving and expanding the technology that has changed what fraud protection for e-commerce looks like. So we don’t have time for office politics. We understand that different people have different work styles and we thrive on variety while learning from each other. We’re all Signifyers, so we know that what needs to get done will get done.

    Signifyd is a place where no one is going to tell you how to do your job. If you want help, you'll get it — from all quarters. But we pretty much figure out what needs to be done, who's in the best position to do it and then let that person roll-up her or his sleeves and have at it. We're protecting retailers from online fraud in a way that's never been done before and we could use your help if you're someone:

    - Who believes challenges are best overcome by thinking differently.

    - Who knows his or her role, but isn’t confined by it.

    - Who’s greatest satisfaction comes from helping customers succeed and achieve their dreams.

    - Who isn’t afraid to disagree, convincingly, civilly and honestly.

    - Who will stop and hold the door for a colleague, even if you’re running late.

    We have a healthy remote working culture with in our Engineering team and check out our blog post by one of our Engineering Manager on our team dynamics https://www.signifyd.com/blog/2018/10/18/far-apart-working-close-together/ 

    Please check out our Engineering blog here https://www.signifyd.com/blog/category/engineering/

    Oh, and a few particulars for this role:

    Excellent programming skills in Java or similar language

    Experience with scripting languages and SQL

    Analytical problem solving skills

    Bonus Skills:

    Experience building large-scale high-performance systems

    Startup experience

    Payments or risk experience

    Machine learning or related knowledge

    Typical Education:

    B.S., M.S., or Ph.D. in Computer Science, Electrical Engineering, Math, or other technical field

    Our stack:

    Java, Python, Cassandra, MySQL, Solr, Apache Spark, Play! framework, Linux, Docker, AWS

    All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability, protected veteran status, or any other characteristic protected by law.

    Posted positions are not open to third party recruiters/agencies and unsolicited resume submissions will be considered free referrals.