Monday, October 12, 2009

Empowering the Data Warehouse with External data

By Rick Durham

To build a powerful data warehouse you must include as much relevant data from internal and external sources as possible to optimize the decision processes that managers and “C level” executives are called to make each day.

As an example, retailers have current/historic sales data along with pricing information, but this will only provide partial insight into the determinants that are driving sales. Information such as weather, income tax distribution periods, regional or local population growth, household demography, may also play a key factor in driving sales and must be taken into consideration.

Where do you go to get data that will complement your internal data sources to provide a much richer data warehouse and BI experience?

The government and many other organizations capture and deliver this data and distribute it free or for a nominal fee. Here are some examples:

Weather: Yahoo offers an RSS fed that can be called using an http request as follows:
http://weather.yahooapis.com/forecastrss?p=48161

The parameters to the request are the following:
Parameter, Description, Examples

p, US zip code or Location ID, p=95089 or p=USCA1116
u, Units for temperature (case sensitive), f: Fahrenheit or c: Celsius


The RSS response from this request includes the following information:
· Geographic latitude/longitude
· Weather Conditions (48 distinct codes)
· Temperature (F,C)
· Forecast (Condition, High Temperature, Low Temperature)


Importing this data daily or even hourly using SSIS packages along with Sales data goes a long way in understanding if weather is a factor in why certain items were purchased at a particular time/date and determining longer term sales trends.

Using Demographic Data along with internally generated data can go a long way to enhance the data warehouse. The following are examples of where this data can be obtained:
http://www.geolytics.com/?gclid=CMeliJqrqJ0CFU1M5QodekqHkA

With limited data (address or lat/long information) you can get 60 demographic attributes for that address that include factors such as income, average number of people per home, average age, education…

Likewise another good site for demographic data and data validation is:
http://www.melissadata.com/dqt/index.htm

This site offers validation against address, phone numbers, email and perform name parsing via Web Services calls which can help accelerate the ETL development process, provided you do not have to develop the code and maintain large demographic databases onsite. Additionally, this site offers demographic data on income, media locations, reverse phone and mailing lists.

Finally, the Federal government maintains thousands of databases with data gathered from various agencies that contain information that can be coupled with internal data to make your data warehouse far more powerful. For example:
http://research.stlouisfed.org/fred2/
http://www.data.gov/catalog
http://www.census.gov/

These sites contain historic economic and demographic data the government has collected regarding income, population, interest rates, commodity prices, housing sales and the downloads are free.

The goal of data warehouse development should be to provide the tools and data for optimal decision making. To assure this goal is achieved, make sure external source are also included in the initial and ongoing data warehouse implementation.


Monday, October 5, 2009

Formatting Linked Server MDX Query Results Using CTEs

By Dan Meyers

In my previous post I described how to export MDX query results in SSIS without having to use a linked server. Even though a linked server is not required and probably not advised the majority of the time, there might be times when it is actually the best solution to your problem.

One of the biggest headaches with this approach it that the data types of all of the fields returned are either nText or nVarchar. This means that you have to deal with converting and formatting the results before you can use them in a report. There are a couple of ways to go about this.

CTEs (
Common Table Expressions) are the most elegant technique to employ here. Plus, they are a bit more readable than their SQL Server 2000 equivalent, Derived Tables. CTEs are only available in SQL Server 2005 and later. If you are using SQL Server 2000 then you have to use a derived table in place of the CTE.

Below is an example of using a CTE to rename and format the results of an MDX query executed against an Analysis Services cube via a linked server:

WITH MDXResults (FiscalYear, FiscalSemester, FiscalQuarter, SalesAmount, GrossProfit)
AS
(
SELECT *
FROM OPENQUERY(SSAS_AW2008,
'SELECT
{[Measures].[Sales Amount], [Measures].[Gross Profit]} ON COLUMNS,
{[Date].[Fiscal].[Fiscal Quarter].Members} ON ROWS
FROM
[Adventure Works]'
)
)
SELECT
CONVERT(VARCHAR(255),FiscalYear) AS FiscalYear,
CONVERT(VARCHAR(255),FiscalSemester) AS FiscalSemester,
CONVERT(VARCHAR(255),FiscalQuarter) AS FiscalQuarter,
CONVERT(FLOAT,SalesAmount) AS SalesAmount,
CONVERT(FLOAT, GrossProfit) AS GrossProfit
FROM
MDXResults




If you are using SQL Server 2000 then you have to use a derived table instead. Below is an example using the same MDX query that is used in the example above:

SELECT
CONVERT(VARCHAR(255),FiscalYear) AS FiscalYear,
CONVERT(VARCHAR(255),FiscalSemester) AS FiscalSemester,
CONVERT(VARCHAR(255),FiscalQuarter) AS FiscalQuarter,
CONVERT(FLOAT,SalesAmount) AS SalesAmount,
CONVERT(FLOAT, GrossProfit) AS GrossProfit
FROM
(
SELECT *
FROM OPENQUERY(SSAS_AW2008,
'SELECT
{[Measures].[Sales Amount], [Measures].[Gross Profit]} ON COLUMNS,
{[Date].[Fiscal].[Fiscal Quarter].Members} ON ROWS
FROM
[Adventure Works]'
)
) AS MDXResults (FiscalYear, FiscalSemester, FiscalQuarter, SalesAmount, GrossProfit)