Tj Kellie
Tj Kellie

Reputation: 6476

What are good problems to solve using CLR stored procs?

I have used CLR stores procedures in SQL server for awhile now, but I'm still wondering what the best situations to use them are.

MSDN provides some guidelines for use like heavy string manipulation (regex), or replacing T-SQL that declares lots of table variables and cursors. I'm curious to know what problems SO users are solving with CLR stored procs, and examples / benchmarks as well.

For example, I have found CLR stored procs + SSRS to be a great way to get data manipulation logic out of SSRS and out of T-SQL, and into managed code that is easier to read and manipulate.

Upvotes: 16

Views: 1453

Answers (6)

Chris Smith
Chris Smith

Reputation: 5454

It's very useful for pulling data out of a system that doesn't provide a traditional SQL interface or the vendor's implementation of that interface is sub-par.

We have a core application built on the old MUMPS platform, running on the Intersystems Cache database. The data is hierarchical, not relational in nature. The main global array (i.e. table) has multiple data levels and elements all grouped together by account number. Scanning even one column requires the entire global be loaded from disk and it takes 8+ hours. The vendor does supply an ODBC driver and mappings to the globals, but it often results in scanning and extremely slow queries.

I built a table valued function that takes an ObjectScript (Intersystem's dialect of MUMPS) program, executes it on the Cache server and returns the output lines as data rows. I can micromanage the data access path on the MUMPS side (that's really what it takes to get efficient data access) by providing a specific program to execute on that side and then easily import the data in MSSQL as an ad-hoc inline data source.

I can use the TVF to drive the data selection or use CROSS APPLY to make lookups on the other end and it's reasonably efficient. I can even run multiple queries on the remote end in parallel if I compel MSSQL to use a parallel execution plan.

Upvotes: 1

Tj Kellie
Tj Kellie

Reputation: 6476

Here is an example of something I used CLR procs for that I thought was neat:

Timed data updates from external webservices using CLR stored procs and SQL jobs.

We have an application that syncs some of the data it tracks with outside industry data feeds. The sync runs weekly for everything and on-demand for single updates too so I had an existing webservice API to access it. Things are already scheduled by a windows service but I thought why not be able schedule them like our other SQL jobs??

I created a CLR stored procedure that refrences the application's webservice API. Then I added a few parms for @RecordID to support single sync, and scheduled it up in Enterprise manager SQL jobs.

Now I can use the Job to run dB syncs or use the proc within other SQL procs or Triggers to update data from the external feed.

It may be cleaner to take the application webservice API out in the future and just use the external webservice directly. For now though, this was very fast to implement and a cool way to extend functionality to the SQL group.

Upvotes: 3

Aaronaught
Aaronaught

Reputation: 122624

Many problems requiring denormalization and/or sequential operations can be handled exceptionally well by the CLR and can be used to dramatically improve performance without sacrificing usability on the SQL end (much). Instead of relying entirely on either set-based or iterative operations, you can take a hybrid approach, use a set-based solution for the big hauls and switch to an iterative model for the tight loops.

The built-in hierarchyid and geospatial (i.e. geography) types in SQL Server 2008 are good examples of the denormalization problem. Both contain an (almost) arbitrarily large amount of data that are difficult to normalize without hurting performance - you would need to use recursion or cursors to do any meaningful work with them otherwise, or use a rat's nest of triggers and/or scheduled tasks to maintain a denormalization table.

Another problem I've solved with CLR types is inline compression. This might sound like a pointless or academic exercise, but when your fully-normalized data is pushing into the terabytes, an 80-90% reduction in size means a lot. SQL has its own built-in compression now and SQL 2005 had vardecimal, and those are good tools as well, but a domain-aware "minimization" algorithm can be several times more efficient in terms of both CPU load and compression rate. Obviously this doesn't apply to every problem, but it applies to some.

Yet another very common problem often found on this site is generating a sequence on the fly - for example a sequence of consecutive dates. Common solutions are recursive CTEs, static sequence tables, and the little-known spt_values tables, but a simple CLR UDF performs better than any of them and offers a lot more flexibility.

Last on my list: User-defined streaming aggregates are also very useful, especially for anything statistics-related. There are some things you simply cannot compose out of the built-in SQL aggregates, such as medians, weighted moving averages, etc. UDAs can also take multiple arguments so you can parameterize them; technically an aggregate isn't guaranteed to receive data in any particular order in the current version of SQL Server, but you can get around that limitation by feeding it a ROW_NUMBER as an additional argument and use this to implement just about any windowing function (have the aggregate spit out a UDT which can then be transformed to a table).

It's actually very frustrating how few examples there are of truly useful SQL-CLR applications; search on Google and you'll get 10 million results, every single one of them for some silly string-concatenation or regex. These are useful, but take a few minutes to learn about SQL UDTs and UDAs in particular and you'll start seeing plenty of uses for them in your own applications. Don't go nuts, of course - think carefully about whether or not there's a better solution in pure SQL - but don't discount them either.

Upvotes: 23

gbn
gbn

Reputation: 432230

  • Custom aggregates
  • String manipulation
  • Custom data types

To be honest, I only see string handling which includes splitting CSVs into rows.

I'd consider anything that needs more then the default trust level to be out of bounds, unless I was a DBA doing DBA type stuff.

From MSDN with RegEx and RSS feed examples: Using CLR Integration in SQL Server 2005

Upvotes: 2

marc_s
marc_s

Reputation: 754438

String manipulation (regexes) was already mentioned, but also DateTime arithmetic, and of course another biggie - calling external web services.

Upvotes: 5

Oded
Oded

Reputation: 498992

String manipulation - regular expression search is a classic. Very easy to expose in CLR, very difficult to do in straight T-SQL.

See this link for details of implementation and a micro-benchmark (SQLCLR is only 47 milliseconds compared to 6.187 seconds for the T-SQL UDF).

Upvotes: 5

Related Questions