How to create dynamic data sources in SSIS

I use a SSIS package to gather information for all my SQL Servers and load it into a single repository on my DBA management server. Information such as database info, security info, HA info, version info, etc. This fully automated repository contains a ton of valuable information that allows me to manage my environment easier and from a central location (I will blog more about this process in the future).

For now, I just want to share a technique that I use to be able to dynamically build data source connections to each of my servers. You can use this to do some really cool things if you are wanting to automate pulling the same data from multiple servers.

Lets start with the example that I want to pull all information stored in master.sys.databases from all my servers and move the data into a single location and I want to do this for 100 servers. You could certainly create 100 different data source connections and have 100 different SSIS tasks, but that is crazy, right? You could also create an individual package for each of the 100 servers, but still craziness! What I am going to show you is how you can dynamically build your dataflow, so you only need 1 task per queried item. In this example, we are pulling just database information so we would just need 1 task created in the Control Flow.

Step 1: You need to have an account created that has the necessary access to all the SQL Instances you are going to be querying. In our example, the account will need read access to the master database or specifically just master.sys.databases. However you grant the access, you will need to make sure the permissions are the same on each server.

NOTE: If you are running this locally from your PC, the package may be running the context of your own account. If you are a DBA and have sysadmin rights on each server it will probably work without any permissions needing assigned. However, when you deploy this to run from a server, it will run under the context of the service account on said server or under a specified proxy account. I recommend creating a proxy account and using a specified account (over using the service account) to access each of your servers and only grant it the least access needed.

Step 2: Create a new blank SSIS package or use an existing one. In our case, I am going to create a new one

Step 3: Create a Connection to the server you want the data to be loaded into. In my case, I am wanting the data to be loaded into the DBServers database on one of my servers. Because I am a sysadmin I do not need to grant any additional access, but I would need to later for the proxy account. You can connect via whatever account you have setup.

In my example, the connection is going to be called DestinationServer.DBServers

Screenshot1_DynamicSSIS

Step 4: Create a new string variable called Servername. Set the Value to your destination server name you just used.

Screenshot2_DynamicSSIS

Step 5: Create a new data source that will serve as your target. The data source will be defined based on what is passed into the variable, but for the sake of creating something we have to put some information in here. For the server name, just use your destination server name and select any database. In my case, I am going to have it point to the master database since that is where the database information we are pulling from is. In my example, I am calling this connection TargetServer.master

NOTE: If you are using this technique, I recommend using 3-part names (Database.schema.object) in your code. The reason being is you may be connecting to different databases on a server to get information and you may not want to create individual data source connections for each of these.

Step 6: Once your Target server connection is created, you will need to go into the properties on it and edit the Expression to connect it with your variable. Type in @[User::Servername] in as the expression. You can hit the evaluate expression button and you should see your destination server name or whatever server you used for the target.

Screenshot3_DynamicSSIS

Step 7: Check to make sure that your Target Server connection now shows an expression next to it.

Screenshot4_DynamicSSIS

Step 8: We are now ready to start creating the data task to pull database information.  Go ahead and create a new Foreach Loop Container. We will call this “Cycle through Servers”.

Step 9: Create a Data Flow Task in your Foreach Loop container that will perform the data pull\load. We will call this “Load Database information”

Screenshot5_DynamicSSIS

Step 10: Go ahead and open your Data flow task and create your data flow. This is typical setup of pulling data from one server and loading it into another. I am just using a query to pull data from master.sys.databases and loading it into a table on my destination server. Nothing special about the data flow, just set it up like you were doing this for 1 server.

Screenshot6_DynamicSSIS

Step 11: Now is the point that you need to create the list of servers you want the Foreach Loop Container to cycle through. There are 2 options here. One allows you to create a list and store the names of your SQL Instances directly in the package, but I do not think this is a very manageable option. The other allows you to pull the Instance names from a table in a database. I am just going to show you how to build your connections based on table values since this is a lot more manageable and will allow you to manage connections without needing to edit the package each time.

Step 12: I have a table already that exists that stores all of the SQL Servers in our environment. If you do not have one, I would suggest creating one in a location that you can manage. My table is stored within my Destination Server database (DBServers), so I can query that to get my list of servers. For the sake of this example, I am going to pull all servers, but you could create different sets to filter out certain things such as prod\non-prod, certain versions of SQL, HA, etc.

Go ahead and create a new variable called AllServers and set the Data type to Object.

Screenshot7_DynamicSSIS

Step 13: On your Control Flow, create a Execute SQL Task command. We will call this “All Servers”. This task will pull the list of servers we want to connect to.

Step 14: Open the properties for the Execute SQL Task you just created and define your connection. In my case, it is the same as my destination server, but if you have it in a different location then it could be different. Fill out the rest of the information, including the query you are using to pull your instance names.

Screenshot8_DynamicSSIS

Step 15: While still in the properties, go the option in the left hand called Result Set. You need to add a line to set the Result Name to the Variable name you created. It should look like this. Result name is always 0.

Screenshot9_DynamicSSIS

Step 16: Go back to your Control Flow screen and go into the properties for you Foreach Loop container (Cycle through Servers). Go to the option in the left hand called Collection and click that to open the options there. In the Foreach Loop Editor at the top change the Enumerator to Foreach ADO Enumerator. Under the ADO object source variable change this to User:AllServers. Leave the other options as the default. It should look like this.

Screenshot10_DynamicSSIS

Step 17: Resolve any errors that the package may be giving you. Once any errors are resolved, you are ready to run the package. If able, go ahead and test it.  As you watch the “Cycle through Servers” task you should see it executing multiple times and completing. It will cycle through all the servers that you defined to the AllServers variable. The task will pull all database info from all servers defined in my table and load it into a table on my destination server.

Screenshot11_DynamicSSIS

Step 18: Finish creating anything else that you want to create and then you can deploy and schedule the task to run. Just remember that if you do you will want to make sure the account the process is running under has the appropriate permissions.

As mentioned earlier I use this functionality on a lot of data we try to compile in a single location. I also created several variables that define server lists based on flags in my database or settings and have lots of data flow tasks that use these. If you are trying to centralize a lot of your management data you can get real creative on how you pull data.

Database Design Considerations

While working for a previous development company, I was asked to provide a list of some high level best practices when it comes to database design. For many of the developers I worked with here (and many places in the past) SQL development is secondary to primary coding language such as .NET, C# or some other language. As a result of this and oftentimes being under pressure to produce results quickly many things slip through the cracks. Or not identified as a problem until its too late and inevitably gets added into the “technical debt” black hole.

These things were not meant to be the “end all be all” but a place to start and to get developers thinking early on.

  1. Plan your design
  • Understand your database’s purpose – Before you begin anything you have to understand what you are building and why you are building it
  • Requirement Gathering – What resources do you need? People, Infrastructure, Tools, etc. What size do you expect this to be? Is this heavily utilized or not? Is this business critical? Etc. While you may not be able to answer every question, you should have an idea.
  • Determine usage patterns – What type of database are you building (OLAP\OLTP, Normalized\De-normalized, etc.)? How do you expect that this database will be used?
  • Draw out your design – Draw out\Document your design somewhere. You may not know everything you need, but you can create a base to build from. Your design should be flexible so that you can easily add to it.
  • Get input from others – Some of the best design comes from working and discussing with others. Utilize the experience and knowledge of others
  • Consider all database options – Are there specific features or changes you want to make before getting started? For example, changing isolation levels, using compression, how to handle deletes, etc. These are things that are easy to implement in the beginning and much harder to do later. Think ahead about things you may need.
  1. Documentation
  • You should have adequate documentation that showcases your design and clearly communicates it to others. You should provide enough information that when you turn the database over to someone else to support it, they can figure out bugs and fix them.
  • By carefully naming your database objects, you can provide a lot of self-documentation on your database. A good naming convention will make it clear to anyone what your database is modeling.
  • Here are some reasons why documentation is important
    • Provides a common language between the business and IT
    • Makes it easier to locate areas of potential issues
    • Changes of making bad decisions is lower because risks are easier to assess
    • Makes maintenance\outages easier to manage and reduces risk
    • Acts as mediator between newcomers and existing staff
    • Reduces the likelihood of misunderstandings
  1. Create a naming standard and commit to it
  • There are several different naming conventions and many have their own style which is fine. However, the most important thing is to be consistent in how you name things and keep things readable. A good naming convention will make a database very readable for anyone who supports the system after you.
  • The following are some guidelines when creating your naming convention
    • Use unique descriptive names that make sense
    • Makes your different objects have distinguishable names. You should be able to tell the difference between a proc, a table, a view, a function, etc. based on the name alone.
    • Avoid using just “ID” as the PK of each table. This makes the database harder to read and requires aliasing when reading multiple ID columns from multiple tables
    • Beware of using SQL Server reserved words in object names (ie. Date, User, Case, etc.). This can create syntax issues if brackets are not used.
    • Avoid using hyphens, spaces, quotes, etc. in object names
  • The following is an example of naming conventions for a few objects
    • Primary Key – PK_
    • Stored Proc – sp_
    • View – vw_
    • Function – fn_
    • Index – ix_
    • Default Constraint – DF_
  1. Testing
  • Test everything and be prepared to make changes.
  • If someone disagrees with you on a design decision, be prepared to generate tests that show the results. What someone thinks or is told is not adequate means to change a design. You should be able to back up any decision you make
  1. Carefully consider what datatypes you use
  • You should always to try to use the smallest datatype you can while taking into account future growth. Choosing the correct datatype can have a large impact on storage and performance. Take time to carefully consider what you use and determine if it is the best use for the scenario.
  • Do not sell yourself too short either, you want to avoid needing to change datatypes later.
  • Understand Unicode vs Non-Unicode and when you should implement one over the other and make sure you are consistent.
  1. Understand normalization and how to implement it
  • I think the level of normalization depends on the project, but it is important to understand the benefits and disadvantages of it during your design.
  • Benefits of normalization
    • Smaller database. By eliminating duplicate data, you will be able to reduce the overall size of the database
    • Better performance – By having more narrow tables, it allows you to have less columns and fit more records per page. Fewer indexes per table means faster writes and maintenance, and you only need to join tables that you need.
  • Disadvantages of normalization
    • More tables to join
    • Tables contains codes instead of real data so you have to go back to the lookup tables for values
    • Difficult to query against. It is optimized for applications, not ad hoc querying.
  1. Use a single column as your Primary Keys
  • When you are designing a new database you should try to design things so everything connects via single integer columns and use foreign key relations from other tables. SQL Server processes number datatypes faster than character datatypes so this increases performance. They also reduce the size of data that is being stored on both the table itself and on any additional indexes. The PK columns are hidden columns in all indexes created.
  • Your primary key will be referenced in other tables so you want it as small and as fast as possible. Composite business keys force you to duplicate data in foreign key relationships and causes more data to be read on joins.
  • Avoid using composite business keys as your primary key. If you need uniqueness, then apply a non-clustered unique index
  • If you do not have a natural primary key to use, then create an identity and let SQL Server assign the number
  • For existing development, then it may not be possible or make sense to do this so instead you may need to work within the confines of the existing database structure
  1. Most everything should have a PK/Clustered Index defined
  • You should have some way to uniquely identify a row in each table
  • The only exceptions to this could be staging tables or ETL tables
  • A primary key is important for SQL server, not only to allow for processes such as replication, Change Data Capture, filetables, audit trail, and data synchronization, but also to simplify the creation of foreign key constraints.
  • Fragmentation that occurs from updates, deletes, inserts
  • Elimination of uniquifiers when performing a table lookup
  1. Use SQL Server built-in functionality to protect data integrity
    1. Nullability – NULL is used explicitly to indicate when a value is unknown, undefined, does not, or cannot exist. If you use NULL values make sure that your code and design can easily support this.
    2. Constraints – (Including NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT, INDEX). These are used to specify rules for the data in a table by limiting the type of data that can go into a table. This ensures the accuracy and reliability of the data stored.
    3. String length – Think about your string length before assigning a value. Plan each column carefully and define size accordingly. Note that anything over 4000 characters will be store as Lob_data and can only contain full-text indexes.
  2. Used stored procedures for SQL code
  • All SQL code should be stored within stored procedures
    • Maintainability – Stored procedures provide a known interface to the data. When code that accesses the database is compiled in a different layer, tweaks cannot be made without a programmer’s involvement. This makes things easier to troubleshoot.
    • Encapsulation – Logic can be stored within a single place and changes without impact to the application
    • Security – More tightly security can be implemented. Specific and granular access can be granted
    • Performance – Plans get re-used more often with stored procs. Additionally, code stored here is much easier to performance tune.
  1. Proper Indexing
  • If you are reading from a table, it needs to have a clustered index
  • Create non-clustered indexes on the columns that are frequently used in predicates and join conditions. These are typically your foreign keys.
  • Understand the characteristics of the database and its most frequently used queries so you can index appropriately so you can create covering indexes for your more critical queries.
  • Determine the optimal storage location for an index. For example, it may benefit from being on a separate filegroup
  • Avoid over-indexing heavily updated tables and keep indexes narrow (as few columns as possible).
  • Consider the order of your columns in your index
  1. Keep it simple
  • While there are certainly times that complicated coding is required, try to keep things as simple as you can. Simple designs are much easier to manage
  1. Consider using flags instead of DELETE
  • Strongly consider whether you should be deleting data from this database. You can implement Delete flags to indicate when a row is no longer active. If it is needed it can always be changed back. It is better to do this from the beginning so that the flag is written into the code to check.
  • The Delete flag should be stored in parent tables