Database And In-Memory Paging With ADO.NET And LINQ
When performing queries over large datasets there is often the need to page the data, such that it can be displayed in smaller amounts, aka pages, incrementally. When thinking about database paging two immediate distinctions can be made; in-memory paging and database paging.
Depending on the size and usage of the data, each has its advantages and disadvantages.
Database and In-Memory paging considerations
Database paging is efficient as the result set is filtered by the database before the data is returned. This is a good thing when querying over very large datasets where the transferring of large amounts of data over the wire is not desirable. Depending on the number of pages that occur, many queries/and or connections (depending on whether a persistent connection is used throughout the life of paging) may need to be made in contrast to “in-memory” paging where all the data is retrieved from the database first, and pagination occurs on the in-memory model instead of the database. The latter obviously involves transferring more information over the wire but saves on trips made back to the database.
Depending on the amount of data transferred, the type of connection to the database and whether or not the data being paged is allowed to be “stale”; a side affect of working with in memory data, in memory paging may still be desirable.
In this post, I am going to take a look at both database and in memory paging with ADO.NET and LINQ.
Data paging with ADO.NET
In-Memory paging with ADO.NET
In-memory paging with ADO.NET involves running a command object, filling a dataset with a data adapter and using a data reader to linearly read the datasets contents. Paging occurs by keeping an index of what record is currently being used and how many records should be returned in the next page.
See below for example:
public class AdoMemoryPager : IAdoPager { public void Page(DbProviderFactory factory, DbConnection connection, int pageSize, string selection) { // Find all the people that are interested in the // chosen technology. string commandText = "SELECT * FROM person as p" + " INNER JOIN PersonTechnologyAssignment AS pta" + " ON p.PersonID = pta.PersonID" + " INNER JOIN Technology as t" + " ON pta.TechnologyID = t.TechnologyID" + " WHERE t.TechnologyName = '" + selection + "'"; DbCommand dbCommand = ADOUtility.CreateCommandObject(factory, connection, commandText); // Have provider create adapter. DbDataAdapter adapter = factory.CreateDataAdapter(); adapter.SelectCommand = dbCommand; int index = 0; DataSet dataset = new DataSet(); // Have provider fill adapter. index += adapter.Fill(dataset, index, pageSize, commandText); IDataReader dbReader; while (index > 0) { // Get a reader over dataset. dbReader = dataset.CreateDataReader(); while (dbReader.Read()) { Console.WriteLine("{0} {1}", dbReader["FirstName"], dbReader["Surname"]); } // Empty the dataset. dataset.Clear(); Console.ReadKey(); int paged = adapter.Fill(dataset, index, pageSize, commandText); if (paged > 0) index += paged; else index = -1; // We are finished. } } }
Database paging with ADO.NET
Coming soon…
Data paging with LINQ
The key thing to remember with LINQ is that it uses a deferred execution model. This means that a query is not run at the point of declaration, but at the point of usage, typically when traversing or enumerating over an enumerable query. You will see later why this is important with respect to data paging.
In-Memory paging with LINQ
In memory paging with linq is the same idea, except using a linq query over an in memory collection. The LINQ query is first defined and the result of query execution is stored in a list so that in memory paging can be performed rather than shifting the job of paging over to the database.
Notice the actual paging is performed by the Skip(…) and Take(…) extension methods when we query in the memory list.
class LinqMemoryPager : ILinqPager { #region ILinqPager Members public void Page(TechnologyDataContext context, int index, int pageSize, string selection) { IEnumerable people; people = (from p in context.PersonTechnologyAssignments where p.Technology.TechnologyName.Equals(selection) select p.Person); // Query is run here. We pull it into memory, and page off of // the memory model. This is stale data, but doesn't require // going back to database. IList pList = people.ToList(); while (true) { IEnumerable pagedPeople; pagedPeople = (from Person p in pList select p).Skip(index).Take(pageSize); IList pList = pagedPeople.ToList(); foreach (var person in pList ) { Console.WriteLine("\t\t {0} {1}", person.FirstName, person.Surname); } // ... Omitted for brevity as this section is much the same. } } #endregion }
Database paging with LINQ
Again, this version is essentially the same, except the skip and take is performed on the enumerable instead of the in memory collection. If you have logging enabled and are connecting to SQL Server 2005 or later…
dataContext.Logging = Console.Out;
You will notice that because paging is being performed in the database, the LINQ query has generated optimised SQL for this. If you are using SQLServer 2005 or later, you should notice that LINQ has generated SQL using the newer SQL code that was added in 2005 for data paging. This demonstrates that one great advantage of data paging in LINQ over ADO.NET is that you as a programmer do not have to worry about optimising your SQL queries as you switch databases. LINQ will do it for you automatically.
class LinqDatabasePager : ILinqPager { #region ILinqPager Members public void Page(TechnologyDataContext context, int index, int pageSize, string selection) { while (true) { var people = Queryable.Skip( (from p in context.PersonTechnologyAssignments where p.Technology.TechnologyName.Equals(selection) select p.Person), index ).Take(pageSize); // Query is run when entering. foreach (var person in people) { Console.WriteLine("\t\t {0} {1}", person.FirstName, person.Surname); } // Query is re-run here. int length = people.Count(); // ... Omitted for brevity as this section is much the same. } } #endregion }
One more thing to notice with the database paging example is that every time you evaluate the query expression, the query is re-generated and executed on the database. Even calling people.Count() in this case causes a new query to be run on the database. This makes sense when you consider the default behaviour of the deferred execution model, as the query and or data source could have changed.
Being aware of this behaviour however is important. Even in this trivial example the database is hit twice for every page.
In-Memory paging with Compiled LINQ Queries
Similar to the in-memory paging we looked at earlier, except this time the paging is performed by a compiled linq query. Compiled LINQ queries are cached for the life of their scope (in this case because the LINQ function is static, it is cached for the life of the program). This means the LINQ query is only compiled once.
Again, if you have logging enabled you will notice that only the paramaters provided to the LINQ function change. Specifically, the index and page size in this example.
class CompiledLinqMemoryPager : ILinqPager { #region ILinqPager Members public void Page(TechnologyDataContext context, int index, int pageSize, string selection) { while (true) { IList people = CompiledPageQuery(context, selection, index, pageSize).ToList(); // Query is run when entering. foreach (var person in people) { Console.WriteLine("\t\t {0} {1}", person.FirstName, person.Surname); } // ...Omitted for brevity as this section is much teh same. } } // Compiled linq query private static Func> CompiledPageQuery = CompiledQuery.Compile((TechnologyDataContext context, string selection, int toSkip, int toTake) => (from p in context.PersonTechnologyAssignments where p.Technology.TechnologyName.Equals(selection) select p.Person).Skip(toSkip).Take(toTake).AsEnumerable()); #endregion }
Conclusions
Data paging can be performed by the database or in-memory depending on the need and preferred behaviour. Implementations range SQL paging, stored procedures in ADO.NET to LINQ queries and even compiled LINQ queries.
Database paging in LINQ especially has its advantages over ADO.NET in that LINQ takes care of generating the underlying SQL to be executed. This means when switching to another database vendor or upgrading to a newer version of the same database provider, LINQ will automatically generate the most optimised code it can without you having to re-write and optimise the SQL code directly (which you would have to do in ADO.NET). Looking at database paging in LINQ on SQL Server 2005 logging shows us that LINQ is making use of the SQL Server 2005’s data paging additions.
LINQ further makes data paging easier than ADO.NET by providing two simple extension methods on enumerable objects; the Skip(…) and Take(…) functions.
All in all, LINQ’s ease of use and automatic optimisation of underlying SQL code makes it a good choice. However, deferred execution means you will be hitting the database twice for every page in this example (because we obtain the count afterward). If you switch to in-memory paging you lose the advantage of fast optimised data paging when the database supports it.

Leave a Reply