Introduction
ORM frameworks such as Entity Framework (Core) and NHibernate work with a few database providers; mostly relational, but NoSQL solutions (CosmosDb, MongoDb, Cassandra...) can technically also be used.
In the typical scenario of a relational database many developers (should) have been taught to follow the rules of data normalization. As an introduction to this article, let's have a look at the third normal form (3NF):
No non-prime attribute of R is transitively dependent on the primary key.
In layman's terms this means that all non-key attributes must be dependent on only the primary key. E.g. an author's date of birth does not belong in a "Books" table because it's not an attribute of a book. This rule leads to the whole idea of a relational database.
These rules are usually intuitively followed "as far as you should, and no further"; we will not split the attributes belonging to an entity over multiple tables, just because we can.
From a maintainability and understandability point of view data normalization makes perfect sense. Saving the information about an entity in a single table also does. But there's a difference between what feels right and what IS right. There can be good reasons to split an entity over multiple tables:
- A table with a lot of columns can have a negative impact on performance (even if you only operate on a small subset of those columns). This is food for another discussion.
- Database management systems (DBMS) have constraints. For example, SQL server has a limit of 8060 bytes per row (there's some more flexibility for variable length data types with large row support)
Working with multiple tables to represent the same entity in code can be difficult. There are a few words to be said about clean code, and especially trying to rewrite a large code base with legacy code is not particularly fun. What if I told you that no (or minimal) changes need to be made to treat multiple tables as the same entity? In these scenario's SQL views can come to the rescue.
Case study
In this article I will show a proof of concept in a console application in .NET Core. I am using Entity Framework Core as the ORM framework. The application connects to the database of a library. We should be able to perform basic CRUD operations in the app.
Let's create a new class "Book".
internal class Book
{
public Guid BookId { get; set; }
[Required]
public string Title { get; set; }
public string Description { get; set; }
public string ISBN13 { get; set; }
public DateTime ReleaseDate { get; set; } //Data type might give problems for very old books, but we'll ignore that for now ;)
public override string ToString() => Title;
}
We can represent a book by a table "Books". If you use a code-first approach you can let Entity Framework generate the necessary migrations for you and hence execute the necessary SQL script for you.
create table dbo.Books (
BookId uniqueidentifier not null primary key,
Title nvarchar(255) not null,
Description nvarchar(max) null,
ISBN13 nvarchar(255) null,
ReleaseDate datetime null
);
Now let's set up the LibraryContext to access and modify books.
internal class LibraryContext : DbContext
{
public DbSet<Book> Books { get; set; }
public LibraryContext()
{
}
protected override void OnConfiguring(DbContextOptionsBuilder options)
=> options.UseSqlServer("<my connection string>");
}
Lastly, we have to implement the core CRUD operations in the console application.
var libraryContext = new LibraryContext();
var allBooks = libraryContext.Books.ToList();
Console.WriteLine("Current books in the library: " + string.Join(", ", allBooks));
Console.ReadKey();
var newBook = new Book
{
Title = "The Pragmatic Programmer",
Description = "your journey to mastery",
ISBN13 = "978-0-13-595705-9",
ReleaseDate = new DateTime(2020, 1, 1)
};
allBooks.Add(newBook);
libraryContext.Books.Add(newBook);
libraryContext.SaveChanges();
Console.WriteLine($"Added book {newBook}.");
Console.ReadKey();
newBook.Description = "an even better journey to mastery";
libraryContext.SaveChanges();
Console.WriteLine($"Updated book {newBook}.");
Console.ReadKey();
libraryContext.Remove(allBooks[0]);
libraryContext.SaveChanges();
Console.WriteLine($"Deleted book {newBook}");
Splitting the table
Now imagine that - for the reasons mentioned in the introduction - we wanted to split the columns of the Books table over multiple tables.
So let's move a few columns to a new table "BooksExtended". A book will have the same BookId in both tables so these tables can be joined by BookId.
create table dbo.BooksExtended (
BookId uniqueidentifier not null primary key,
ISBN13 nvarchar(255) null,
ReleaseDate datetime null
);
Rather than treating these two tables as separate entities, we can use a single SQL view to represent books. This view will merge the columns from table Books and BooksExtended.
CREATE OR ALTER VIEW dbo.BooksMerged AS
SELECT b1.BookId, b1.Title, b1.Description,
b2.ISBN13, b2.ReleaseDate
FROM dbo.Books b1
LEFT OUTER JOIN dbo.BooksExtended b2 ON b1.BookId = b2.BookId;
We can inform Entity Framework about the table to use for an entity with a TableAttribute. In this case it's a view, but in the eyes of EF that's the same thing.
To understand this case study it's important to understand what an ORM does. It does not operate in the DBMS. There is no "magic" going on. All an ORM does is manage entities in memory, and when changes need to be saved, translate those changes to SQL statements and send them to the DBMS. In SQL (DQL and DML) there is no syntactical difference difference between a table and a view.
It's always "SELECT Y FROM X".
It's always "UPDATE X SET Y".
It's always "INSERT INTO X".
It's always "DELETE FROM X".
[Table("BooksMerged")]
internal class Book
{
public Guid BookId { get; set; }
[Required]
public string Title { get; set; }
public string Description { get; set; }
public string ISBN13 { get; set; }
public DateTime ReleaseDate { get; set; } //Data type might give problems for very old books, but we'll ignore that for now ;)
public override string ToString() => Title;
}
If you use NHibernate you can use the "table" attribute in the mapping document to set the view name. Alongside this POC I have create another demo in NHibernate to confirm this.
https://nhibernate.info/doc/nhibernate-reference/mapping.html
When running the app, reading from the view is not a problem. But as soon as we need to modify an entity we run into problems:
View or function 'BooksMerged' is not updatable because the modification affects multiple base tables
If we were to execute an insert/update/delete query on SQL Server (the chosen DBMS for this demo) directly, we would have the same problem.
The problem is clear: we have a join in the view definition, so creating, updating, and deleting will not be possible in the current situation. "Complex" view can't be modified by design because SQL Server doesn't know how to do that. Luckily there is a workaround: triggers.
We can define SQL triggers that tell SQL Server how to manage both tables INSTEAD OF the default behavior (an exception). DML triggers use the conceptual tables "Deleted" (for a deletion) and "Inserted" (for an update or insertion).
CREATE OR ALTER TRIGGER dbo.trDelete_BooksMerged ON dbo.BooksMerged
INSTEAD OF DELETE
AS
BEGIN
DELETE b FROM
dbo.Books b inner join Deleted d on b.BookId = d.BookId;
DELETE b FROM
dbo.BooksExtended b inner join Deleted d on b.BookId = d.BookId;
END
CREATE OR ALTER TRIGGER dbo.trUpdate_BooksMerged ON dbo.BooksMerged
INSTEAD OF UPDATE
AS
BEGIN
UPDATE b
SET
b.Title = i.Title,
b.Description = i.Description
FROM
dbo.Books b inner join Inserted i on b.BookId = i.BookId;
UPDATE b
SET
b.ISBN13 = i.ISBN13,
b.ReleaseDate = i.ReleaseDate
FROM
dbo.BooksExtended b inner join Inserted i on b.BookId = i.BookId;
END
CREATE OR ALTER TRIGGER dbo.trInsert_BooksMerged ON dbo.BooksMerged
INSTEAD OF INSERT
AS
BEGIN
INSERT INTO dbo.Books
SELECT BookId, Title, Description
FROM Inserted;
INSERT INTO dbo.BooksExtended
SELECT BookId, ISBN13, ReleaseDate
FROM Inserted;
END
Side note: if your primary key is auto-generated by the server in one of the tables then you can use an OUTPUT ... INTO ... clause, like so:
CREATE OR ALTER TRIGGER dbo.trInsert_BooksMerged ON dbo.BooksMerged
INSTEAD OF INSERT
AS
BEGIN
INSERT INTO dbo.Books
OUTPUT Inserted.BookId, Inserted.ISBN13, Inserted.ReleaseDate INTO dbo.BooksExtended
SELECT Title, Description
FROM Inserted;
END
When we run the app again it will most likely work fine (see below why not).
A few caveats
Enable triggers in the ORM framework
Depending on the ORM framework and version that you are using, you might need to make a few additional configuration changes.
In modern versions of Entity Framework triggers have to be enabled explicitly (the "less efficient" approach), like so:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Book>()
.ToTable(tb => tb.HasTrigger("trUpdate_BooksMerged"));
}
Disable (auto-generated) migrations for your view
Make sure that you don't accidentally create migrations for your view. In case of EF Core, it will think that it's a table and generate a CREATE TABLE statement. Off the top of your head, I can think of two ways to achieve this.
1. Define the entity as a view in OnModelCreating. This is not really a viable solution because that way we can't enable triggers and updates to the entity will fail.
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Book>()
.ToView("BooksMerged");
}
2. Explictly exclude the entity from migrations in OnModelCreating.
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Book>()
.ToTable(tb =>
{
tb.HasTrigger("trUpdate_BooksMerged");
tb.ExcludeFromMigrations();
});
}
You can still create migrations for the individual tables by creating them as separate entity types and adding them to the DbContext. If you want to push your view as a custom migration you can also do that by creation a custom class that inherits from the Migration class.
Enable triggers for bulk insert
BULK INSERTs, by default, ignore insert triggers. You will have to enable them in order to insert into complex views.
In SQL server, use FIRE_TRIGGERS.
When using the SqlBulkCopy class, set copyOptions to FireTriggers in the constructor.
Final thoughts
This article was more a proof of concept than trying to convince you to use views for your entities. If you have to split a table into multiple tables because it has too many columns then there's possibly a bigger problem, and you have to ask yourself questions:
- Why do I have so many columns? Do I really need all of them?
- Is there maybe a problem with my data model?
- If I really need so much information about each entity, can I maybe benefit from a (semi-) NoSQL approach?
Especially the argument about performance becomes questionable. Yes, having too many columns in a single table has a negative impact on performance (again, not the subject of this article). If you split them over multiple tables and only operate on the table that you need then it makes sense. But how much does it really matter? I don't have the benchmarks to make a convincing argument.
Triggers themselves don't impact performance, but the implementation of the query will, and I can almost guarantee you that inserting into/updating/selecting from two tables is slower than one. On top of that, ORM frameworks use a "slow approach" with verbose SQL when triggers are enabled (example, related to the EF Core 7 update).
So with all of this in mind, I think we can conclude that using a view to represent an entity is an easy and convenient, but maybe not best approach to circumvent a DBMS's limitations.
Finally, I would like to say that this idea works in SQL server. If you use a different system then you will have to figure out how (and if) triggers and views work.