Saturday 2 June 2012

Bulk Insert into SQL Server using SqlBulkCopy


Bulk Insert into SQL Server using SqlBulkCopy

I was recently tasked with a project at a company to update an SQL Server 2008 database with large amounts of data each day. The task at first seemed daunting due to the files exceeding well over 400,000 records and there were several that needed processing daily. I first tried LINQ to SQL, but with the amount of data, the inserts were slow performing to say the least. Then I remembered the SqlBulkCopy class. SqlBulkCopy lets you efficiently bulk load a SQL Server table with data from another source. The SqlBulkCopy class can be used to write data only to SQL Server tables. However, the data source is not limited to SQL Server; any data source can be used, as long as the data can be loaded to a DataTable instance or read with a  IDataReader instance. For this example the file will contain roughly 1000 records, but this code can handle large amounts of data.
To begin with let’s create a table in SQL Server that will hold the data. Copy the following T-SQL into SQL Server to create your table:

CREATE TABLE [dbo].[Censis](
          [Suburb] [varchar](200) NULL,
          [NotStated] [int] NULL,
          [NotApplicable] [int] NULL,
          [Fishing] [int] NULL,
          [Mining] [int] NULL,
          [Manufacturing] [int] NULL,
          [Electricity] [int] NULL,
          [Construction] [int] NULL
) ON [PRIMARY]
GO

The table above will hold Censis data that is freely available to download in Australia.

The next item to do is create a console application that will bulk load the data. Open Visual Studio 2008 and choose File > New > Windows > Console Application. 

Before moving on, to explain the code I have to work backwards and explain the final method that bulk loads data. SqlBulkCopy has a method called WriteToServer. One of the overloads of this method takes a DataTable as the parameter. Because a DataTable contains rows and columns, this seemed like a logical choice for the task I was facing.

Jumping back to the example we now know we need to create a DataTable that contains the information from the text file. The code below demonstrates how to do this:

C#

DataTable dt = new DataTable();
string line = null;
int i = 0;

using (StreamReader sr = File.OpenText(@"c:\temp\table1.csv"))
{  
      while ((line = sr.ReadLine()) != null)
      {
            string[] data = line.Split(',');
            if (data.Length > 0)
            {
                  if (i == 0)
                  {
                  foreach (var item in data)
                  {
                        dt.Columns.Add(new DataColumn());
                  }
                  i++;
             }
             DataRow row = dt.NewRow();
             row.ItemArray = data;
             dt.Rows.Add(row);
             }
      }
}

VB.NET

Dim dt As New DataTable()
Dim line As String = Nothing
Dim i As Integer = 0

Using sr As StreamReader = File.OpenText("c:\temp\table1.csv")
      line = sr.ReadLine()
      Do While line IsNot Nothing
             Dim data() As String = line.Split(","c)
                  If data.Length > 0 Then
                        If i = 0 Then
                         For Each item In data
                                    dt.Columns.Add(New DataColumn())
                         Next item
                         i += 1
                        End If
                   Dim row As DataRow = dt.NewRow()
                   row.ItemArray = data
                   dt.Rows.Add(row)
                  End If
            line = sr.ReadLine()
      Loop
End Using


In the code above, I created a DataTable that will store all the information from the csv file. The csv file resides in the C:\Temp directory. I am using a StreamReader object to open the file and read each line in the file. Each line is then split up into a string array. That string array will be assigned to each DataRow as the ItemArray value. This sets the values for the row through the array.

When the file has been read, the next thing to do is use the SqlBulkCopy class to insert the data into SQL Server. The following code demonstrates how to do this:

C#

using (SqlConnection cn = new SqlConnection(ConfigurationManager.ConnectionStrings["ConsoleApplication3.Properties.Settings.daasConnectionString"].ConnectionString))
{
      cn.Open();
      using (SqlBulkCopy copy = new SqlBulkCopy(cn))
      {
            copy.ColumnMappings.Add(0, 0);
            copy.ColumnMappings.Add(1, 1);
            copy.ColumnMappings.Add(2, 2);
            copy.ColumnMappings.Add(3, 3);
            copy.ColumnMappings.Add(4, 4);
            copy.DestinationTableName = "Censis";
            copy.WriteToServer(dt);
      }
} 

VB.NET

Using cn As New SqlConnection(ConfigurationManager.ConnectionStrings("ConsoleApplication3.Properties.Settings.daasConnectionString").ConnectionString)
      cn.Open()
       Using copy As New SqlBulkCopy(cn)
             copy.ColumnMappings.Add(0, 0)
                  copy.ColumnMappings.Add(1, 1)
                  copy.ColumnMappings.Add(2, 2)
                  copy.ColumnMappings.Add(3, 3)
                  copy.ColumnMappings.Add(4, 4)
                  copy.DestinationTableName = "Censis"
                  copy.WriteToServer(dt)
       End Using
End Using

SqlBulkCopy uses ADO.NET to connect to a database to bulk load the data. I have created an SqlConnection object, and that object reference is used to create the SqlBulkCopy object. The DestinationTableName property references a table in the database where the data is to be loaded. A handy feature of SqlBulkCopy is the SqlBulkCopyColumnMappingCollection. Column mappings define the relationships between columns in the data source and columns in the destination. This is handy if the data source file has columns that don’t need to be inserted into the database. Column mappings can be set by an index, such as the example above, or they can be set by the name of the column. Using the index is handy when you’re working with files that contain no column names. Finally the data is sent to the database by running the WriteToServer method.

The SqlBulkCopy class has been around since .NET 2.0, but it is easy to forget about this as the languages move forward, but this is one object you should keep in mind when transferring large amounts of data. The entire source code of this article can be downloaded over here

No comments:

Post a Comment