speak with the application’s developers.speak with regular users of the application.During schema development at Fivetran, our developers: There are several ways to build an understanding of source data. You can get some help from Fivetran’s application documentation pages, many of which contain setup guides for the respective applications. With SaaS applications, you have to rely on the vendor’s documentation and APIs - but the quality of both may be inconsistent.
When you’re working with an in-house database, chances are you have developers and data engineers who can explain the model. Understanding what data columns correspond to what real-world equivalents is essential to making sense of the data. Every SaaS app implicitly contains a representation of the world, inasmuch as the world consists of organizations, people, transactions and other common business concepts. The first and most important step to leveraging data from an application is to understand the underlying data model. And users don’t get to tweak SaaS schemas - they’re defined by each application’s developers.Įnterprise data is more valuable when it’s replicated to a data warehouse and joined with data from other applications - and organizations do get to design these data structures. The SaaS apps in particular may provide some general analytics functionality, but they cover only the data from that single application. But the schemas for data sources - whether they’re databases such as MySQL, PostgreSQL, or Microsoft SQL Server or SaaS applications such as Salesforce, Facebook Ads, or Zuora - aren’t designed with analytics in mind. In the data analytics world, both data sources and data warehouses use schemas to define data elements. In other words, a well-designed schema clears the way to faster and easier creation of reports and dashboards.īy contrast, a flawed schema requires data analysts to do extra modeling, and forces every analytics query to take more time and system resources, increasing an organization’s costs and irritating everyone who wants their analytics right away. providing analysts with a clear, easily understood starting point for analytics.absolving analysts from having to reverse-engineer the underlying data model.removing cleaning and other preprocessing from the analyst’s workflow.
A well-designed schema in a data warehouse makes life easier for analysts by The Importance of Database Schema DesignĪ schema organizes data into tables with appropriate attributes, shows the interrelationships between tables and columns, and imposes constraints such as data types. When a data pipeline extracts data from SaaS or database sources and loads it into a data warehouse, it can perform preprocessing, such as cleaning and normalization, to make the data consistent and legible, and then populate the tables described by the schema at the destination. Since data warehouses are relational databases, data stored in a data warehouse is described by a schema too. Good database schema design is essential to making your data tractable so that you can make sense of it and build the dashboards and reports you need. It defines the logical database design, and to some extent, depending on the database, the physical design too. You can think of the schema as a blueprint for the tables and relations of a data set. A schema is a document that lays out the logical structure of a database and translates the data model into specific tables, columns, keys and interrelations. What Is a Database Schema?Įvery database can be described by a data model: a picture of all the elements in all the entities represented in the databases. Fivetran focuses on the data pipeline, but we’ve worked with a multitude of customers with different use cases, and we’ve learned some lessons about database schema design best practices. What makes for good data analytics? A wealth of data sources, timely and reliable replication to a data warehouse, and, most of all, data organized into a schema that is easy for analysts to access and use.