Dienstag, 17. April 2012

SQL Server 2012 Filestream, Filetable and the added value with SharePoint 2010

I wrote about the new SQL Server 2012 Filetable feature based on Filestream, which was already been coming with the last version SQL Server 2008, before (see here: LINK ). Now let’s see Filetable together with SharePoint in action.
First we have to activate Filetable which is not been active per default after installing SQL Server 2012. How to activate this can be found here: LINK .
My settings are:
-          There is a database named “FileStreamDemo”.
-          In the “options” menu you can see that there is a Filestream directory named  “FileStreamData”
-          There is a table called “dbo.FileTableData” which points to the FileStreamData directory:
CREATE TABLE [dbo].[FileTableData] AS FILETABLE ON [PRIMARY] FILESTREAM_ON [FileStreamDemoGroup1]
WITH
(
FILETABLE_DIRECTORY = N'FileTableData', FILETABLE_COLLATE_FILENAME = SQL_Latin1_General_CP1_CI_AS
)
-          NON_TRANSACTED_ACCESS = FULL, that allowed us interact with the data in the filetable directory bypassing the SQL Server API and use the windows explorer direct.
-          The script in the background shows the create script for the database. We can see that the database files *.mdf and *.ldf are placed in a different directory than the Filestream data.
I placed several DEMO files in the Filestram directory:

If we now query the table “dbo.FileTableData” the result is:
The columns are fixed and base on the propertys coming from the NTFS file system:
Column Name
Data Type
stream_id
uniqueidentifier ROWGUIDCOL
file_stream
varbinary(max) FILESTREAM
name
nvarchar(255)
path_locator
hierarchyid
creation_time
datetimeoffset(7)
last_write_time
datetimeoffset(7)
last_access_time
datetimeoffset(7)
is_directory
bit
is_offline
bit
is_hidden
bit
is_readonly
bit
is_archive
bit
is_system
bit
is_temporary
bit

Each FileTable includes also these computed (read-only) columns:
Column Name
Data Type
parent_path_locator
hierarchyid
file_type
nvarchar(255)
cached_file_size
bigint

So we see that the content from FileStream directory is returned as BLOB data placed in the column “file_stream” as varbinary(max). By the way: the data placed in here is automatic part of the backup and restored process.
If we now use BCS to connect SharePoint to this database table we have two issues:
1.       We are not able to bring the content directly to the user without writing code that extracts / transforms the BLOB data.
2.       Using SharePoint to unwrap the BLOB data generates load on our SharePoint. The data must be load from SQL Server / Filesystem to SharePoint and from the SharePoint Server to the Client.
These two issues will have the same answer:
Let’s point to the data in the file-system using the given information. That means only the metinformation where the file is placed is transferred and the client picks up the data directly from where it is stored.

-          In scenario 1 the BLOB data is transferred via SharePoint.
-          Scenario 2 shows the way only the metinformation (blue arrows) is transferred via SharePoint and the user access the content directly.
Scenario 2 is what I will show now. The metinformation where the data is located is not part of the data in the “dbo.FileTableData” table. Also there are some columns we do not need to transfer via BCS to our SharePoint. So let’s write a SQL query which is given as the needed and filtered information:
1.  SELECT
2.  stream_id,
3.  name,
4.  file_type,
5.  cached_file_size,
6.  creation_time,
7.  last_write_time,
8.  last_access_time,
9.  is_directory,
10. 'file:\\DEMO2010A\FileStream\FileStreamData\FileTableData\' + name AS Link
11. FROM dbo.FileTableData

The line 10 brings the “magic” within this. This T-SQL call generates for every entry in the “dbo.FileTableData” table a link to the original data in the Filesystem. This link is attached as a new column called “Link”:
This query is now used to create a view called “FileTableView”.
Next step is now connection this view via BCS to SharePoint. I do this using SharePoint Designer to create a new External Content Type. How to create External Content Types and handling BCS is not part of this post. If you need help within this just contact me or have a look here: LINK .
So here is my solution:
As you can see I created an External Content Source called “FileTableData” which is based on the view “FileTableView”. For this use case I only need a ReadList and a ReadItem operation.
Publishing this External Content Type to a SharePoint External List (and of course doing some XSL customizing) generates a list looking like this which shows exactly the content of the FileStream directory:
Clicking the “Link” given to each entry opens the document / folder. The access happens as shown in PIC 4 directly to where the content is stored.
Resume:
This is an interesting way bringing data into a SharePoint environment without blowing up the content databases of SharePoint. This data can also be protected with SQL Server backup and recovery tools. But of course there a several 3th Party Tools doing this ore similar thing with more features around. Anyway, for me it’s a create feature and an additional option in data management.

See this post as webcaste here:

Freitag, 24. Februar 2012

Mittwoch, 15. Februar 2012

SharePoint Search – a look behind the scene Part III

Part III: SharePoint Search and BCS
(...which data is stored in context of crawled and managed properties in the SharePoint Search databases?)
In the second post of this series we see that the metadata shown in the result-set is stored in the SQL Server, even the data is coming from an extern BCS source. The given metadata like “name”, “description” etc. is stored in the PropertyBlob field explained in the first post. But this Blob only contains data which are part of the managed property “HitHighlightedSummary”. What happened now if we add a crawled property coming from a BCS source to a managed property?
Situation:
We have an External Content Source called “LOB2”. This content source is connected to an SQL Server database called “MiCLAS_TEST”.
The External Content Source contains the table called “cis.Vorgang_z
If we now search for example the term “330114” witch is a value from the column “VorNummer” we get that result in SharePoint Search:
Catching the call with SQL Server Profiler (as described in Part II) and query it in SQL Server Management Studio it looks like this:
The PropertyBlob in this case contains the following data as varbinary:
0x020200EEFFFE7F99010000BD0010000080024078DAC590314BC36018841F639A96125408A50E0E19BA65B198A14310A4140A828242A0434B054104ADAE6EDF3F4F9FA420F41734C7BD977BBF5BDE6B9AA689801B4918CADE819C1387849833DAD97ED75D2671240106EEFB5CC8DAFFB44B24A2161BAE744FE24EDC3215A55B8EFC921FBE78E7931D1FE4CCF5DFFCF2A6FFD39F1A35CFBCF0C803AF62C1B6BB7FED88BD7AE08D11993D8D75B93A0999BC3CA89922A414E64ADF66763AB3DF7F7557D96565BFF766E78C58A97BBEC92717006452D3CBBACA8100805882AFCDCB81
If we know add crawled propertys to a managed propertys the PropertyBlob chaned1. So were are this informations stored used to generate the updated Property Blob? (The update happens while fullcrawling the content source) The answer is the table “dbo.MSSDocProps“ in the „Search_Service_Application_PropertyStoreDB“.

Lets have a look using the DocId 2056 from the call above:
(for a better undestrand I alsoadd the “FriendlyName” column from the table dbo.MSSManagedProperties to the query )

USE Search_Service_Application_PropertyStoreDB_b506dce49c514f8899ae51e503889885
SELECT v2.FriendlyName,
         v1.*
FROM dbo.MSSDocProps v1, Search_Service_Application_DB_dd13ba19a7bb4ffaafcc3e626e73c949.dbo.MSSManagedProperties v2
WHERE DocId = 2056
and v1.PID = v2.PID
Now I map some crawled property to managed properties:
After a full crawled I execute the query again:
In case of the “LOB2Date” and the “LOB2Cur” values you can see that the data is in the IIVal column. The “LOB2Bez” value is clear text in the strVal2 column. LOB2Date is a cryptic datetime value based on the datatype DateTime Structure. LOB2Cur is a decimal value.
1The updated PropertyBlob now contains the following:
0x020400EEFFFE7F9A0100009B0100009D010000BD0010000F000800008002400100014078DAC590314BC36018841F639A96125408A50E0E19BA65B198A14310A4140A828242A0434B054104ADAE6EDF3F4F9FA420F41734C7BD977BBF5BDE6B9AA689801B4918CADE819C1387849833DAD97ED75D2671240106EEFB5CC8DAFFB44B24A2161BAE744FE24EDC3215A55B8EFC921FBE78E7931D1FE4CCF5DFFCF2A6FFD39F1A35CFBCF0C803AF62C1B6BB7FED88BD7AE08D11993D8D75B93A0999BC3CA89922A414E64ADF66763AB3DF7F7557D96565BFF766E78C58A97BBEC92717006452D3CBBACA8100805882AFCDCB8148006F006C00640069006E006700200043006F006D00700061006E007900E07AB02600000000
So we see all the content contained in managed metadata is stored in the Search_Service_Application_PropertyStoreDB“ database, even the data is coming from an extern BCS source. This is intresting for some security issues  (there will be a separate post about this soon)and also for SQL Server maintainence and index defragmentaion. There is a very good article about this availibel on blogs.msdn: LINK
See the complete post inc. the hands-on lab as webcaste here:


Dienstag, 7. Februar 2012

SharePoint Search – a look behind the scene Part II

Part II: SharePoint Search and BCS
(...which data is stored in the SharePoint Search databases?)

Based on the first post “SharePoint Search – a look behind the scene Part I” we will know have a closer look to what happened in the SQL Server if a search request is generated (in this case against a BCS source).

In this example I used an already existing BCS source in my DEMO environment described here: LINK

In the SharePoint Search Center I used the search text “DVD”

With the SQL Server Profiler I can filter out the query SharePoint fires against the SQL Server:

exec sp_executesql
N'/* {00C1C222-BB31-408A-815F-C3A76E85C290} */
exec dbo.proc_MSS_GetMultipleResults @P1,@P2,@P3,@P4,@P5',
N'@P1 int,
@P2 nvarchar(4000),
@P3 nvarchar(4000),
@P4 nvarchar(4000),
@P5 varbinary(8000)',-2147483647,N'dvd',N'',N'SET NOCOUNT ON ;
DECLARE @joinRows INT ;
1.  SET @joinRows = DATALENGTH(@joinData) / 8 ;;
2.  WITH DocIds(DocId, Value) AS (
3.  SELECT TOP(@joinRows) CAST(SUBSTRING(@joinData,
((ord.n-1)*8) + 1, 4) AS INT), CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 5, 4) AS INT)
FROM dbo.MSSOrdinal AS ord WITH(NOLOCK) WHERE ord.n <= @joinRows )

    4. SELECT P.DocId, P.SummaryBlobSize, P.Size, P.LastModified, P.IsDocument, P.IsPictureUrl, P.Author, P.Title, P.Url,
P.PictureThumbnailUrl, P.ContentClass, P.FileExtension, P.Tags, P.PropertyBlob,
CASE WHEN P.PropertyBlob IS NULL THEN 0 ELSE DATALENGTH(P.PropertyBlob) END,
P.PopularSocialTags, P.SiteName, P.Description, P.ParentLink, P.NumberOfMembers, P.PictureHeightAndWidth, P.DisplayDate
FROM dbo.MSSDocResults AS P WITH(NOLOCK), DocIds AS T WHERE P.DocId = T.DocId OPTION (MAXDOP 1) ',0x00001592000000000000...700000031


So let’s see what exactly happens by disassemble that query call.
We can see that the Stored Procedure dbo.proc_MSS_GetMultipleResults is called. But we want to go just another step deeper and find out what’s behind this call. (The Stored Procedure dbo.proc_MSS_GetMultipleResults will be part of one of the next posts.)
First of all the DATALENGTH of the generated BLOB witch is stored in @joinData (details for the BLOB data can be found here: LINK) is used to set the variable @joinRows:
1.       SET @joinRows = DATALENGTH(@joinData) / 8
Let’s see what the result is using this query:
                SELECT DATALENGTH(0x000015920000…700000031) / 8

The result is “50

In the next step a temporary result-set called “DocIds” is created using the “WITH” SQL statement.
2.       WITH DocIds(DocId, Value) AS (…
The call generates a result-set looking like this:
In the first column we see the DocIDs we will need in the next step. But let’s have a look on how this result is generated. The point in the query is this one
1.       SELECT TOP(@joinRows)
CAST(SUBSTRING(@joinData,((ord.n-1)*8) + 1, 4) AS INT),
CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 5, AS INT)
FROM dbo.MSSOrdinal AS ord WITH(NOLOCK) WHERE ord.n <= @joinRows
Using the CAST(SUBSTRING(@joinData,((ord.n-1)*8) + 1, 4) AS INT)on the BLOB data stored in the @joinData will fillter out a list of item identifiers and their rank contained in an Id value pair described also here LINK. This is the only magic creating the result-set showen in step 2.
Next is an easy join using the tempory result-set “DocIds” and the contend of the table “dbo.MSSDocResults”. The join is done with the part "WHERE P.DocId = T.DocId" showen in the query below:
2.       SELECT P.DocId, P.SummaryBlobSize, P.Size, P.LastModified, P.IsDocument, P.IsPictureUrl, P.Author, P.Title, P.Url, P.PictureThumbnailUrl, P.ContentClass, P.FileExtension, P.Tags, P.PropertyBlob,
CASE WHEN P.PropertyBlob IS NULL
THEN 0
ELSE DATALENGTH(P.PropertyBlob)
END,
P.PopularSocialTags, P.SiteName, P.Description, P.ParentLink, P.NumberOfMembers, P.PictureHeightAndWidth, P.DisplayDate
FROM dbo.MSSDocResults AS P WITH(NOLOCK),
DocIds AS T WHERE P.DocId = T.DocId OPTION (MAXDOP 1)
The result is:
So we see that the results given back from SQL Server contains all the data needed even the data is coming from an extern BCS source. The given metadata like “name”, “description” etc. is stored in the PropertyBlog field explained in the first post.
If we now call the ProfilePage of the BCS source all data fields witch are configured are needed. This result in call against the data source defined in the BCS model. In my case this is also an SQL Server call because my External Data Source is a SQL Server:
Captured with SQL Profiler it looks like this:
exec sp_executesql N'SELECT [ProductKey] , [ProductName] , [ProductDescription] , [BrandName] , [ClassName] , [ColorName] , [UnitPrice] , [ProductSubcategoryName] , [ProductCategoryDescription] , [Expr1] FROM [dbo].[V_ContosoRetailDW_ProductSales] WHERE [ProductKey] = @ProductKey',N'@ProductKey int',@ProductKey=1642






and the result similar to what we can see at the ProfilePage:

See the complete post inc. the hands-on lab as webcaste here: