Reverse ETL – A Definitive Guide
Reverse ETL has been creating a buzz in the modern data stack industry. Is the flutter of activity around this technology a paradigm shift or a passing trend? In this article we explore what is meant by Reverse ETL and how it is different from standard ETL or ELT. Most importantly how organizations can unlock value by activating the data they accumulate using Reverse ETL.
What is Reverse ETL?
“Reverse ETL is the process of taking cleaned and processed data from a source of truth like a data warehouse. Ingesting this data into a system of action like a CRM, advertising platform, or other SaaS app to operationalize or activate this data and make it deliver insights of actionable value to the business users.”
The rise of the modern data stack and pervasive availability of cloud infrastructure has brought data warehousing into the mainstream and made it affordable for companies of all sizes. This coupled with the steep rise in adoption of SaaS tools leading to an explosion in the data being generated meant that more and more companies are now reliant on some underlying system to integrate this cross platform data. Use of ELT/ETL tools like DataChannel is now a norm to bring data into a cloud data warehouse like Google Bigquery , Redshift or Snowflake and create a single source of truth for their organizations. The typical utilization of this data was to power analytical dashboards built using BI tools like Tableau, Power BI or Looker or to do more predictive analytics to establish forecasts and trends.
But there remains a gap between this data and its real users, the business teams, the frontline army of executives who rely on a different set of tools that they use for their everyday work. As per a report by IDC & Seagate, “Only 32% of data available to enterprises is put to work. The remaining 68% goes unleveraged.” For example, for a member of the sales team, the customer data enriched with product usage metrics, lead scores and forecasts of predictions around the likelihood of deal closure is futile sitting in a data warehouse or on a dashboard. Unless it is made available in the CRM they log into every day to drive their business decisions it may as well not exist. This “last-mile” gap in analytics is what Reverse ETL intends to plug in the modern data stack. In essence, Reverse ETL means getting relevant data from the data warehouse into business tools and applications your teams use for their daily work.
How is Reverse ETL different from ELT/ETL?
This is the first question which comes to our mind when we think of reverse ETL. After all ETL tools have been around for ages and although since the advent of modern cloud data warehouses, ELT has taken over (Read the difference between ETL & ELT here), they all involve moving data from one source to another. So what is special about reverse ETL? Other than the obvious difference that the data warehouse is the source and not the destination in this case, there are other differences which necessitated the separate categorization of Reverse ETL. These are as under:-
- Reverse ETL process needs to constantly monitor the data in the warehouse for any changes. Once it detects a change, it needs to be able to diff the data and sync the changes to the destination. This needs to be done with speed and without putting unnecessary repetitive query load on your data warehouse.
- While building conventional ELT, you typically do not need in-depth business understanding. Whatever data the source makes available you take and dump in the warehouse for processing. However, in reverse ETL you need to understand not just the destination API, but also the business use case to identify the field mapping and rules around it so that the correct data points are sent to the downstream tools.
- Data in the warehouse is typically in a format suitable for analytics, simply querying it and sending it to downstream tools is not enough. It would almost always require some last mile transformation to be suitable form useful in the downstream tool.
Why Do I Need Reverse ETL?
“Moving beyond just dashboards and reports and making actionable data easily accessible after it has been processed in the data warehouse, covers the last mile gap in analytics. This can truly democratize actionable insights”
Organizations are spending a lot of time, resources and money in setting up their data infrastructure. The ultimate aim and purpose of this effort is to get actionable insights from this data. BI tools like Tableau, Power BI & Looker etc with their dashboards and reports are the most common way of consuming this data. But are they enough? While these dashboards are great for high level decision making, monitoring, gut checking plans and forecasts they may not be enough for day to day decisions and effective functioning of the business teams.
Consider a SaaS company collating the product usage data for its customers in a data warehouse, creating customer cohorts and identifying churn risk customers or actively engaged customer segments. Having these numbers on a Looker dashboard is great for the top management, but for a sales representative speaking to the customer, it is much more important that this data is made available in her CRM software which she uses every day.
Reverse ETL can also eliminate the need to share ad-hoc dumps of data in the form of CSV files and excel sheets which business teams continuously keep requesting from the data engineers. These are not only cumbersome and time consuming, but also a security nightmare as keeping a track of these is nearly impossible.
Customer Data Platform (CDP) vs Reverse ETL
“I already have a Customer Data Platform or CDP, so I do not need reverse ETL as part of my data stack”. This is one of the most common mis-conceptions prevalent. While CDPs offer some great advantages with built-in data ingestion, identity resolution, audience segmentation and management, they also have some severe disadvantages.
- First one being that it just adds another data silo instead of building up a single source of truth as your data is not in your data warehouse, but inside the CDP platform. You just have no control over it. Also, at some point you will need to use an ETL pipeline to bring this data into your warehouse for analysis and for joining with other data sets anyways.
- Most CDPs will force you to adopt their data model and structure, whereas you would have requirements unique to your organization. Having your customer data in your data warehouse gives you complete flexibility in how you want to store and use this data.
- Most of these platforms have very limited abilities when it comes to data transformation. Having this data in your data warehouse gives you the unlimited power of using SQL to transform the data the way you want to.
- Most good CDP platforms are prohibitively expensive with long implementation times and learning curves.
Instead of getting boxed in with overpriced CDP platforms, if you were to leverage your existing investments in your data warehouse and employ that as your CDP, you can then easily use reverse ETL to activate / operationalize this data.
Some use cases for Reverse ETL
Across all business functions there can be numerous use cases for reverse ETL, some of them are listed below:-
- Sync segmented customer data to audiences in marketing tools like Google Ads, Facebook Ads, Tiktok Ads etc to run targeted ad campaigns.
- Sync audience segments to email marketing and marketing automation tools to run hyper personalized campaigns.
- Enrich customer data in the CRMs with usage statistics, lead scores etc.
- Get user behavioral data like engagement, tickets created etc from the data warehouse into the CRM tools.
- Analyze usage data in the data warehouse, create customer segments on the basis of user activity and sync these segments to the production databases to create personalized in-product experiences.
- Analyze feature adoption data and create user segments based on that, sync these segments to email automation tools to run campaigns to increase adoption.
- Create notifications on top of the data warehouse to get notified on important triggers like drop in usage or activation of new signups etc
Customer Success & Support Teams
- Use enriched customer data to prioritize support tickets
- Identify upsell opportunities and churn risk customers using user behavior data
- By enabling data to be readily available to the business users, data teams can be freed from this responsibility and can concentrate on higher value tasks while retaining complete control over the entire data pipeline.
- By enabling notifications to be sent on various triggers from the data warehouse a lot of monitoring efforts can be reduced.
How can I get started with Reverse ETL?
While the answer to this question will depend on the maturity of the organization in terms of their data stack, the good news is that with affordable and reliable tools like DataChannel, teams of all sizes can get started fairly easily with Reverse ETL and begin their modern data stack journey.
The steps involved in this would be as under.
- Set up a data warehouse. (Learn how DataChannel’s managed data warehouse offering can get you started in minutes)
- Use an ETL tool to set up and schedule data ingestion into your data warehouse.
- Set up and schedule transformations on your data to make it ready for downstream analysis.
- Use a reverse ETL tool or write custom code to send data from the data warehouse to your business applications and schedule this.
You can easily see that these steps are sequential building up on each other and would not be possible or useful in isolation.
Reverse ETL – Build vs Buy?
By now hopefully you are sold on the idea that you definitely need Reverse ETL in your data ecosystem. Next logical question is why not build one in-house. After all, you do have software engineers and data engineers within your team. If you have ever considered or bought enterprise software, you have wrestled with this question before. (We do have a nice writeup about the “Build vs Buy” decision for ETL tools here). While the technical skills needed to connect to one API end-point to send data may not be that high, maintaining a set of pipelines sending incremental data updates reliably to a number of data destinations starts to become very complex very fast. It is not only insanely resource intensive (imagine continuously watching your data warehouses for change, batching those changes, sending them to downstream APIs each with their own authentication requirements, rate limits, data formats, request formats and rules, on a schedule without missing a row or a field), but also requires constant engineering bandwidth. The destination APIs are always evolving and business teams will always keep changing where and what data they need. Even with all the best intentions, you will still not be able to implement all the ETL best practices.
Examples of an end to end pipeline using Reverse ETL
As we discussed above, there are a vast range of use cases for reverse ETL, however, we are showcasing a few examples where we have seen our customers utilize it for instant and exponential gains in productivity. Operational analytics can truly revolutionize the way a business consumes data.You can read more such case studies here.
- A beauty & skin care brand selling on their D2C store on shopify was using DataChannel to get their customer and order data into their data warehouse. Since this was already set up employing ETL best practices, their data teams then created customer cohorts using Machine Learning algorithms to identify customers with higher propensity to buy or customers at risk of churning. The marketing team used Reverse ETL to run personalized campaigns by syncing this data with audiences on their marketing automation tools while the product team synced these scores back to their production databases to run tailored in-app campaigns and promotions.
- An IoT product brand again selling D2C was acquiring customers through paid campaigns and was running more than a 1000 campaigns. Monitoring all of them for performance was being done through dashboards built on top of data in their data warehouse. But, they wanted greater control in the hands of people actually running the campaigns. They set up alerts directly on top of their data warehouse to be notified when their campaigns were performing sub-optimally. Like when the CPC went beyond a certain value or when the CTR dropped below a threshold. They got these alerts directly in their slack and email inboxes using Reverse ETL allowing them to take immediate action.
B2B SaaS Product
- A B2B Saas product company wanted to ensure their product usage data was made available to their sales teams along with data from their support desk systems. This data was available in the data warehouse, but was not accessible to the sales reps. Using reverse ETL this was synced into their CRM to be used by sales reps for having a more informed conversation with their customers.
- A large marketing agency managing the performance marketing budgets for more than 100 customers was struggling with uploading offline conversions to Facebook and Google for all their customers. Even though this data was available to them in Google Sheets. They used DataChannel to first bring this data into their data warehouse automatically and then used Reverse ETL to sync it to the respective ad platforms. This not only saved them hundreds of man hours of dreary error prone manual labor, but brought their client reporting and satisfaction to a whole new level.
Conclusion – Why investing in Reverse ETL now is the right decision
If it is data driven efficiency in functioning that you desire and you want to maximize the return on investment in your data teams, then there is no escaping the fact that you will need to bring in the missing piece of Reverse ETL in your data stack. Your competitors are most probably already embarking on this path and unless you do so too, you will miss out on the exponential growth it can bring. If you are scared of and are holding back because of the insane costs some of the platforms in this space are charging, give DataChannel a look.
DataChannel – An integrated ETL & Reverse ETL solution
- 100+ Data Sources. DataChannel’s ever-expanding list of supported data sources includes all popular advertising, marketing, CRM, financial, and eCommerce platforms and apps along with support for ad-hoc files, google sheets, cloud storages, relational databases, and ingestion of real-time data using webhooks. If we do not have the integration you need, reach out to our team and we will build it for you for free.
- Powerful scheduling and orchestration features with granular control over scheduling down to the exact minute.
- Granular control over what data to move. Unlike most tools which are highly opinionated and dictate what data they would move, we allow you the ability to choose down to field level what data you need. If you need to add another dimension or metric down the line, our easy to use UI lets you do that in a single click without any breaking changes to your downstream process.
- Extensive Logging, fault tolerance and automated recovery allows for dependable and reliable pipelines. If we are unable to recover, the extensive notifications will alert you via slack, app and email for taking appropriate action.
- Built to scale at an affordable cost. Our best in class platform is built with all ETL best practices built to handle billions of rows of data and will scale with your business when you need them to, while allowing you to only pay for what you use today.
- Get started in minutes. Get started in minutes with our self-serve UI or with the help of our on-boarding experts who can guide you through the process. We provide extensive documentation support and content to guide you all the way.
- Managed Data Warehouse. While cloud data warehouses offer immense flexibility and opportunity, managing them can be a hassle without the right team and resources. If you do not want the trouble of managing them in-house, use our managed warehouse offering and get started today. Whenever you feel you are ready to do it in-house, simply configure your own warehouse and direct pipelines to it.
- Activate your data with Reverse ETL. Be future-ready and don’t let your data sit idle in the data warehouse or stay limited to your BI dashboards. The unidimensional approach toward data management is now undergoing a paradigm change. Instead, use DataChannel’s reverse ETL offering to send data to the tools your business teams use every day. Set up alerts & notifications on top of your data warehouse and sync customer data across all platforms converting your data warehouse into a powerful CDP (Customer Data Platform). You can even preview the data without ever leaving the platform.