Data sampling for comfortable SSAS 2012 Tabular modelling

My initial goal to work in tabular was to have a comfortable environment to work with. To do this i wanted to work only with a sample of the warehouse data while at the same time take advantage of the tabular environment “bells and whistles”.

Working with SQL queries instead of mapping objects (tables, views) directly has the big disadvantage that you cannot use later the visual interface to pick columns and use filters with the GUI. On the other hand, using partitioning on tables, when partitioning itself is also a SQL query has the same effect. As soon as you want to add a column graphically, the SQL partitioning is reset.

So the data sampling filter must be part of the data source.

A recommendation which seems to be commonly accepted is to try to work with views instead of mapping directly tables from the SQL source. This isolates the Tabular model from the eventual data source changes and allows also to enable/disable sampling inside the view (we suppose that we have control over the data source and can add our views to it). Enabling/disabling sampling via modifying the views is easier than doing it by Filters in Tabular GUI and can be automated in a more simple and safe way.

With this in mind first i chose the sampling unit, which in my case having Clients, Contracts and Contract Facts are the Clients. The idea is to design a sampling based on Clients and that then this sampling propagates following the DBs relational logic to the rest of the DB.

To store the sampling itself i use a “helper” table in some corner of the DB where it does not bother, and there i apply some sampling criteria, in this case a random sample receipt.

CREATE TABLE TEMP_DimClientSample (
   ClientId nvarchar(100)
)

insert into TEMP_DimClientSample
SELECT ClientId FROM DimClient
WHERE (ABS(CAST(
(BINARY_CHECKSUM
(ClientId, NEWID())) as int))
% 100) < 2 -- put here the sample percentage

And then create the views to consume this with selects like:

SELECT 
    DimClient.*
from 
    DimClient
    -- remove next 2 lines when going into production
    join TEMP_DimClientSample
        on DimClient.ClientId = TEMP_DimClientSample.ClientId

or

SELECT 
    DimContract.*
from 
    DimContract
    join DimClient on DimContract.ClientId = DimClient.ClientId
    -- remove next 2 lines when going into production
    join TEMP_DimClientSample
        on DimClient.ClientId = TEMP_DimClientSample.ClientId

Related posts: