
I defined what SCD means in a Enterprise Intelligence answer in my earlier submit. We additionally mentioned that whereas we don’t anticipate to deal with SCD2 in a Energy BI implementation, we are able to deal with situations much like SCD1. On this submit, I clarify how to take action.
Situation
We now have a retail firm promoting merchandise. The corporate releases the listing of merchandise in Excel format, together with listing worth and supplier worth, yearly. The product listing is launched on the primary day of July when the monetary yr begins. We now have to implement a Energy BI answer that retains the most recent product knowledge to analyse the gross sales transactions. The next picture reveals the Product listing for 2013:

So every year, we obtain the same Excel file to the above picture. The information are saved on a SharePoint On-line web site.
Situation Defined
Because the earlier submit explains, an SCD1 at all times retains the present knowledge by updating the previous knowledge with the brand new knowledge. So an ETL course of reads the info from the supply, identifies the present knowledge within the vacation spot desk, inserts the brand new rows to the vacation spot, updates the present rows, and deletes the eliminated rows.
Right here is why our situation is much like SCD1, with one exception:
- We don’t truly replace the info within the Excel information and don’t create an ETL course of to learn the info from the Excel information, determine the adjustments and apply the adjustments to an middleman Excel file
- We should learn the info from the supply Excel information, maintain the most recent knowledge whereas filtering out the previous ones and cargo the info into the info mannequin.
As you see, whereas we’re taking a really totally different implementation strategy, the outcomes are very related with an exception: we don’t delete any rows.
Implementation
Here’s what we should always do to attain the purpose:
- We get the info in Energy Question Editor utilizing the SharePoint Folder connector
- We combite the information
- We use the ProductNumber column to determine the duplicated merchandise
- We use the Reporting Date column to determine the most recent dates
- We solely maintain the most recent rows
Getting Knowledge from SharePoint On-line Folder
As we get the info from a number of information saved on SharePoint On-line, we now have to make use of the SharePoint Folder connector. Comply with these steps:
- Login to SharePoint On-line and navigate to the location holding the Product listing Excel information and replica the location URL from the browser
- From the Get Knowledge within the Energy BI Desktop, choose the SharePoint Folder connector
- Click on Join
- Paste the Website URL copied on step 1
- Click on OK
- Click on Remodel Knowledge
- Click on the filter dropdown on the Folder Path column
- Discover the Merchandise Checklist folder internet hosting the Excel information and choose it
- Click on OK
- Rename the question to Product
To this point, we’re related to the SharePoint On-line Folder in Energy Question Editor. The subsequent step is to mix the Excel information.
Combining Recordsdata
We now have a number of choices to mix binary information in a desk from the Energy Question Editor. On this submit, we use probably the most simple methodology:
- Click on the Mix Recordsdata button from the Content material column
- Choose the ProductList desk
- Click on OK

The above course of creates a few queries grouped in separate folders, as proven within the following picture:
To this point, we now have efficiently mixed the Excel information. The subsequent step is to maintain the most recent knowledge solely.
Protecting the Newest Knowledge
Within the subsequent few steps, we glance nearer on the knowledge, and we implement a mechanism to determine the most recent knowledge, maintain them and cargo them into the info mannequin.
- Wanting on the outcomes of the mixed knowledge reveals a Supply.Identify column that we do now require to maintain, so we take away it by deciding on it and clicking the Take away Columns button from the Residence tab
To this point, we now have related to the SharePoint On-line Folder and mixed the contained Excel information. Let’s have a look at the info and see what we’ve bought. I sorted the info by ProductNumber to higher perceive the info adjustments. The next picture reveals the outcomes:
Because the above picture reveals, there are a number of merchandise showing in a number of lists. That’s precisely what we anticipated to see. The purpose is to maintain the most recent product knowledge solely based mostly on the Reporting Date. So we should always get the ProductNumber and the utmost of the Reporting Date. To realize this, we use the Group By performance in Energy Question Editor. Utilizing the Group By from the UI within the Energy Question Editor makes use of the Desk.Group()
operate in Energy Question. Because the Group By course of doesn’t want the info to be sorted we take away the Sorted Rows step. With that, let’s get the job carried out.
- Choose the ProductNumber column
- Click on the Group By column from the Remodel tab
- Enter Reporting Date for the New column title
- Choose Max from the Operation dropdown
- Choose the Reporting Date from the Column dropdown
- Click on OK
The next picture reveals the outcomes:
We now have all product numbers with their newest reporting dates. The one remaining piece of the puzzle is to hitch the outcomes of the Grouped Rows step with the info of its earlier step. For that we use the Merge Queries performance which runs the Desk.NestedJoin()
operate in Energy Question.
- Choose the Grouped Rows step from the Utilized Steps listing from the Question Settings pane
- Click on the Merge Queries button from the Residence tab
- Choose the Product (Present) desk from the dropdown. Word that we’re deciding on the present question (Product)
- On the highest desk, press the Ctrl button in your keyboard and choose the ProductNumber and the Reporting Date columns sequentially
- Do the identical for the underside desk. Word that the sequence of choosing the columns is vital
- Be certain that the Be a part of Form is Left Outer (all from first, matching from second)
- Click on OK
As talked about earlier, the merge operation makes use of the Desk.NestedJoin()
operate, which accepts two tables (highlighted in yellow within the expression under), an inventory of their key columns to make use of on the be a part of (highlighted in crimson within the expression under), a reputation for the brand new column of kind desk and the be a part of type. Within the above operation, because the Grouped Rows is the final transformation step, we joined the outcomes of the Grouped Rows transformation step by itself. Right here is the code generated by Energy Question Editor after going via the step 21 to 27:
Desk.NestedJoin(#"Grouped Rows", {"ProductNumber", "Reporting Date"}, #"Grouped Rows", {"ProductNumber", "Reporting Date"}, "Grouped Rows", JoinKind.LeftOuter)
However that’s not what we would like, we don’t want to hitch the outcomes of the Grouped Rows transformation step by itself. We have to be a part of the outcomes of the Grouped Rows transformation step by the outcomes of the Eliminated Columns step. Subsequently, we now have to change the above expression as follows:
Desk.NestedJoin(#"Grouped Rows", {"ProductNumber", "Reporting Date"}, #"Eliminated Columns", {"ProductNumber", "Reporting Date"}, "Grouped Rows", JoinKind.LeftOuter)
The next picture reveals the modification made within the expression and the outcomes:

- Click on the Broaden button on the Grouped Rows column
- Deselect the ProductNumber and Reporting Date columns to maintain the opposite columns chosen
- Untick the Use authentic column title as prefix possibility
- Click on OK
All carried out! The next picture reveals the ultimate outcomes:
We will now apply the adjustments to load the info into the info mannequin. With this system, when a brand new Excel file (a brand new Product Checklist) lands in SharePoint On-line, Energy BI goes via the above transformation steps to make sure we at all times have the most recent Product knowledge loaded into the info mannequin, which is similar to the behaviour of an SCD1.
Have you ever used this methodology earlier than? Do you could have a greater method to deal with the same situation? You may share your ideas within the feedback part under.