Onehouse: The Universal Data Platform

Ingest and transform data in minutes, at a fraction of the cost. Store your data in an auto-optimized, open source data lakehouse to power all of your analytics and AI use cases. Query anywhere.

Try It free

Continuous Data Ingestion

The fastest ingestion of all your data into Apache Hudi, Apache Iceberg, and Delta Lake tables in your cloud storage. Onehouse ingestion delivers industry leading performance at a fraction of the cost thanks to incremental processing.

Continuous Data Ingestion

Event streams

Ingest data directly from any flavor of Apache Kafka.

Database change data capture

Replicate operational databases such as PostgreSQL, MySQL, SQL Server, and MongoDB to the data lakehouse and materialize all updates, deletes, and merges for near real-time analytics.

Files on cloud storage

Monitor your cloud storage buckets for any file changes, and rapidly ingest all of your Avro, CSV, JSON, Parquet, Proto, XML, and more into optimized lakehouse tables.

Deliver Powerful Performance at a Fraction of the Cost

Fully-managed ingestion pipelines

Fast ingestion to keep up with source systems in near real-time.

Serverless autoscaling

Automatically scale compute to handle bursts in data ingestion without having to provision and monitor clusters.

Cost-optimized infrastructure

Run on low-cost compute optimized for ETL/ELT with simple usage-based pricing.

Achieve Quality that Counts

Data quality validation

Set and enforce expectations of data quality. Quarantine bad records in-flight.

Auto data discovery

Never miss a byte. Onehouse continuously monitors your data sources for new data and seamlessly handles schema changes.

Automated schema evolution

Adapt to schema changes as data is ingested, so upstream sources don’t disrupt the delivery of high-quality data.

Support all major data lakehouse formats

Eliminate data lakehouse format friction. Work seamlessly across Apache Hudi, Apache Iceberg, and Delta Lake.

Table format interoperability

Leverage interoperability between Apache Hudi, Apache Iceberg, and Delta Lake

Multi-catalog sync

Eradicate lock-in with the most robust set of data lakehouse format and data catalog compatibility.

Single copy of data

Provide abstractions and tools for the translation of lakehouse table format metadata.

Incrementally Transform Your Data

Transform data with industry-leading incremental processing technology for ETL/ELT to deliver data in near real-time.

Incremental processing

Efficiently process only data that has changed to reduce compute costs and processing time.

Low code pipelines

Write data pipelines and transformations with a no-code or low-code UI with schema previews.

Bring your own transformations

Leverage pre-built transformations or bring your own to securely transform data in your own VPC.

Hassle-Free Table Optimizations and Management

No maintenance required. Onehouse manages everything your pipelines need, from core infrastructure to automatic optimizations.

Automatic table optimizations

Automate services such as compaction and clustering to optimize query performance, write efficiency, and storage usage.

Time travel history

Create Savepoints for point-in-time snapshots of your tables. Time-travel with point-in-time queries.

Pipeline observability

Monitor pipeline stats and data quality with prebuilt dashboards.

Query With Any Engine

Query anywhere. With your data stored in open table formats in your cloud account, Onehouse frees you to use any catalog and query engine of your choice.

Use all data lakehouse formats

Store data in your choice of Apache Hudi, Apache Iceberg, and Delta Lake, or use all three simultaneously.

Integrate across catalogs

Automatically sync your tables to multiple catalogs, such as AWS Glue, DataHub, Hive Metastore, Databricks Unity Catalog, Snowflake Iceberg tables and more.

Query anywhere

Write once and query anywhere from Amazon Athena, Amazon EMR, Amazon Redshift, Databricks, Dremio, Google BigQuery, Snowflake, Starburst, StarRocks, and so many others.

How Customers Use Onehouse Today

Onehouse works with a variety of customers from large enterprises to startups who are starting their data journey. We have experience working across all verticals from Technology, Finance, Healthcare, Retail, and beyond. See what customers are doing with Onehouse today:

Full Change Data Capture

A Onehouse customer with large deployments of MySQL has many transactional datasets. With Onehouse they extract changelogs and create low-latency CDC pipelines to enable analytics ready Hudi tables on S3.

Sources

Analytics

Real-time machine learning pipelines

An insurance company uses Onehouse to help them generate real-time quotes for customers on their website. Onehouse helped access untapped datasets and reduced the time to generate an insurance quote from days/weeks to < 1 hour.

Sources

Analytics

Replace long batch processing time

A large tech SaaS company used Onehouse’s technology to reduce their batch processing times from 3+ hours to under 15 minutes all while saving ~40% on infrastructure costs. Replacing their DIY Spark jobs with a managed service, they can now operate their platform with a single engineer.

Sources

Analytics

Ingest Clickstream data

A talent marketplace company uses Onehouse to ingest all clickstream events from their mobile apps. They run multi-stage incremental transformation pipelines through Onehouse and query the resulting Hudi tables with BigQuery, Presto, and other analytics tools.

Sources

Analytics

Full Change Data Capture

A Onehouse customer with large deployments of MySQL has many transactional datasets. With Onehouse they extract changelogs and create low-latency CDC pipelines to enable analytics ready Hudi tables on S3.

Sources

Analytics

Real-time machine learning pipelines

An insurance company uses Onehouse to help them generate real-time quotes for customers on their website. Onehouse helped access untapped datasets and reduced the time to generate an insurance quote from days/weeks to < 1 hour.

Sources

Analytics

Replace long batch processing time

A large tech SaaS company used Onehouse’s technology to reduce their batch processing times from 3+ hours to under 15 minutes all while saving ~40% on infrastructure costs. Replacing their DIY Spark jobs with a managed service, they can now operate their platform with a single engineer.

Sources

Analytics

Ingest Clickstream data

A talent marketplace company uses Onehouse to ingest all clickstream events from their mobile apps. They run multi-stage incremental transformation pipelines through Onehouse and query the resulting Hudi tables with BigQuery, Presto, and other analytics tools.

Sources

Analytics

Do you have a similar story?

Meet With Us

How Does Onehouse Fit In?

You have questions, we have answers

What is a data lakehouse?

A data lakehouse is an architectural pattern that combines the best capabilities of a data lake and a data warehouse. Data lakes built on cloud storage such as Amazon S3 are the cheapest and most flexible ways to store and process your data, but they are challenging to build and operate. Data warehouses are turn-key solutions, offering capabilities traditionally not possible on a lake such as transaction support, schema enforcement, and advanced performance optimizations around clustering, indexing, and more.

Now with the emergence of data lakehouse technologies, you can unlock the power of a warehouse directly on the lake for orders of magnitude cost savings.

Is Onehouse an enterprise Hudi company?

While born from the roots of Apache Hudi and founded by its original creator, Onehouse is not an enterprise fork of Hudi. The Onehouse product has built upon OSS Hudi to offer a data lake platform similar to what companies such as Uber have built, while adding interoperability to support Apache Iceberg and Delta Lake environments, along with many other advanced features to deliver the leading universal data lakehouse.

Does Onehouse aim to replace other tools in my stack such as Databricks or Snowflake?

Onehouse offers services that are complementary to Databricks, Snowflake, and many other popular data warehouse or data lake query engines. Our mission is to accelerate your time to adoption of a lakehouse architecture while reducing your total infrastructure cost and simultaneously enabling you to work with near real-time data. We focus on foundational data infrastructure that are left out as DIY struggles today in the data lake ecosystem. If you plan to use Databricks, Snowflake, Amazon EMR, Google BigQuery, Amazon Athena, Starburst, or many other popular data infrastructure tools, we can help accelerate and simplify your adoption of these services. Onehouse interoperates with Apache Iceberg and Delta Lake to better support Databricks and Snowflake queries respectivel.

Where does Onehouse store my data and is it secure?

Onehouse delivers its management services on a data plane inside of your cloud account. This ensures no data ever leaves the trust boundary of your private networks, and sensitive production databases are never externally exposed. You maintain ownership of all your data in your personal Amazon S3, Google Cloud Storage, or other cloud storage buckets. Our commitment to openness is to ensure your data is future-proof. As of this writing, Onehouse is SOC2 Type I and Type 2 compliant. We are also multi-cloud available.

When would I consider using Onehouse?

If you have data in relational databases, event streams, cloud storage, or even data lost inside data swamps, Onehouse can help you ingest, transform, manage, and make all of your data available in a fully managed lakehouse. We don’t have our own query engine, so we don’t play favorites. We focus simply on making your underlying data performant and interoperable for any and all query engines.

If you are considering a data lakehouse architecture to either offload costs from a cloud data warehouse or unlock data science and machine learning, Onehouse can provide standardization around how you build your data ingestion pipelines and leverage battle-tested and industry leading technologies to achieve your goals while dramatically reducing costs and efforts.

What is Onehouse pricing?

Onehouse measures how many compute-hours are used to deliver its services, and we charge an hourly compute cost based on usage. Connect with our account team to dive deeper into your use case and we can provide total cost of ownership estimates for you. We are proven to significantly lower the cost of alternative approaches by 50% or more.

How can Onehouse help existing Hudi users?

If you have large Apache Hudi environments and you want help operating them better, Onehouse can offer a limited one-time technical advisory service. Onehouse engineers and developer advocates are always active in the Hudi community on Slack and Github to answer questions on a best-effort basis.

Signup for a free trial to receive $1,000 free credit for 30 days

Try It Free
We are hiring diverse, world-class talent — join us in building the future.