Aug 11 2010

Select DataSet and number of total rows with one stored procedure

when you want to write a search using .net and MSSQL, it’s a pain. This is because you’re forced to select every row in the table and then only display a small subset of it. This works okay for tables that have a few hundred rows, as query caching can make this faster. But what happens when you’re searching a table with half a million rows?

Unless you’re a complete masochist, you’re doing to want to split this into a more manageable data set, otherwise you’re gonna eat all the memory on your server. But this means that you can no longer use the DataSet.Tables[0].Rows.Count property to figure out how many rows you have. You can write a second stored procedure that’ll count the rows. But who wants to clog up their database with tons of stored procedures for no reason? Let’s consolidate it into one.

So what does this look like?

First: the stored procedure.

We’ll use output parameters to pass the row count back to our code


create procedure [dbo].[Search]
@searchText varchar(512), @recordsToReturn INT, @pageNumber INT, @numberofrows INT OUTPUT
AS

-- get the page we want to view
select * from
(
select *, ROW_NUMBER() OVER (ORDER BY creation_timestamp DESC) AS row from [table] where [table].columnName like '%' + @searchText + '%'
)
AS results WHERE row between (@pageNumber - 1) * @recordsToReturn + 1 and @pageNumber*@recordsToReturn;

-- get the total number of rows, not just the subset we want
set @numberofrows = (select count(*) from [table] where [table].columnName like '%' + @searchText + '%')

END

Now the C# (this’ll work in VB too, but feel free to convert it yourself)


SqlConnection conn = new SqlConnection();
conn.ConnectionString = ".....your connection string here.....";
conn.Open();

DataSet returnData = new DataSet();

SqlDataAdapter da = new SqlDataAdapter( "SearchMessages", conn);
da.SelectCommand.CommandType = CommandType.StoredProcedure;

da.SelectCommand.Parameters.Add("@searchText", SqlDbType.VarChar).Value = "bob";
da.SelectCommand.Parameters.Add("@recordsToReturn", SqlDbType.Int).Value = 10;
da.SelectCommand.Parameters.Add("@pageNumber", SqlDbType.Int).Value = 1;

//number of rows
SqlParameter outputParameter = new SqlParameter("@numberofrows", SqlDbType.Int, 2);
outputParameter.Direction = ParameterDirection.Output;

da.SelectCommand.Parameters.Add(outputParameter);

da.Fill(returnData, "theData");

int numberOfRowsInDataSet = (int)outputParameter.Value;

da.Dispose();
conn.Close();

Best of luck! As always, leave a message in the comments if you have questions


Jun 17 2010

Batch Delete Performance SQL Server

Deleting old records from a table with > 3 000 00 rows. What’s the best way to do this?

It seems the fastest way to do this is simply to:

delete from [table] where creation_timestamp < dateadd (mm, -6, getdate())

(deleting anything older than 6 months)

It took 3 hours (10916 seconds to be exact) to delete 1.6 million (1,619,433) records this way. (148.35 / second).

We needed to do a second batch the next day, but wanted to split it into batches to try to get better performance.

Running:


delete from [table] where pk_id in(
select pk_id from (
SELECT ROW_NUMBER() OVER (ORDER BY creation_timestamp desc) AS RowNumber, pk_id
FROM [table]
where creation_timestamp < '2009-12-16 13:52:08.673') _objectsToDelete WHERE RowNumber between 1 and 100000)

takes 12 minutes. (732 seconds) (136.61 / second).

Strangely, using the TOP command with a subquery takes the longest:


delete from [table] where pk_id in (select TOP 100000 pk_id from [table] where creation_timestamp < '2009-12-16 13:52:08.673')

15 minutes (904 seconds) (110.61 / second)

Have a better way? Let me know in the comments!