Posts mit dem Label SQL Profiler werden angezeigt. Alle Posts anzeigen
Posts mit dem Label SQL Profiler werden angezeigt. Alle Posts anzeigen

Montag, 7. Mai 2012

SharePoint Search – a look behind the scene Part IV

Part IV: Last part with a summary and some further issues

In the last parts of the series “SharePoint Search – a look behind the scene” (Part I , Part II , Part III) we have seen how SharePoint interacts with the SQL Server and which data is stored in which SQL Server Database belonging to the Search Service.

In Part III I show where custom managed properties belonging to an object are stored and how SharePoint handles this correlation. As one of the missing parts now let´s have a look at social Tags. The social Tags are stored in the UserProfile and not within the content. So for me it was interesting to see, that in the Search database the social Tags information is stored together with the content information in one table. The Table is the MssDocResults table in the Search_Service_Application_PropertyStoreDB. Folowing link shows a description of the table. http://msdn.microsoft.com/en-us/library/dd773971(v=office.12).aspx  The column PopularSocialTags contains the Tags given from the users.

Following example shows a T-SQL Search for the tag “I like it”:

SELECT * FROM [Search_Service_Application_PropertyStoreDB].[dbo].[MSSDocResults]

WHERE[PopularSocialTags] like '%I like it%'

So what we also see here is that this table is just “de-normalized”. For example the tag “SQL Server” and not only its ID / key is stored in the table MSSDocResults.

This is because of performance issues. Search is read optimized and it’s faster to deliver the data / relation from content and social Tags without doing joins before.
The keyword “Performance” brings me to the next point. During this “look behind” series we see, and that’s also the information SQL reports give us, that some tables are more frequently used and have a bigger impact in the context of search performance.
There is a really good article on which index is frequently used on MSDN. Most frequently used tables by search queries are:
-          dbo.MSSDocProp
-          dbo.MSSDocResults
-          dbo.MSSOrdinal
in the Search_Service_Application_PropertyStoreDB.
So if performance is a bottleneg it can be helpful placing this tables / index on a separate filegroup on a fast disksubsystem.
Another interesting part is the security. In case of a high sensitive environment we have to see clear which data is placed in which database and how it is protected. Communication between SharePoint Application Server and the SQL Server SharePoint Search databases is not really critical. The search requests are compiles in the BLOB Data as shown in Part I. But a maybe critical point is, that the data coming from external systems via BCS are placed clear text in the Search_Service_Application_PropertyStoreDB / table: dbo.MSSDocProps

this can be an issue. Details see Part III.

See the complete post inc. the hands-on lab as webcaste here:

Mittwoch, 15. Februar 2012

SharePoint Search – a look behind the scene Part III

Part III: SharePoint Search and BCS
(...which data is stored in context of crawled and managed properties in the SharePoint Search databases?)
In the second post of this series we see that the metadata shown in the result-set is stored in the SQL Server, even the data is coming from an extern BCS source. The given metadata like “name”, “description” etc. is stored in the PropertyBlob field explained in the first post. But this Blob only contains data which are part of the managed property “HitHighlightedSummary”. What happened now if we add a crawled property coming from a BCS source to a managed property?
Situation:
We have an External Content Source called “LOB2”. This content source is connected to an SQL Server database called “MiCLAS_TEST”.
The External Content Source contains the table called “cis.Vorgang_z
If we now search for example the term “330114” witch is a value from the column “VorNummer” we get that result in SharePoint Search:
Catching the call with SQL Server Profiler (as described in Part II) and query it in SQL Server Management Studio it looks like this:
The PropertyBlob in this case contains the following data as varbinary:
0x020200EEFFFE7F99010000BD0010000080024078DAC590314BC36018841F639A96125408A50E0E19BA65B198A14310A4140A828242A0434B054104ADAE6EDF3F4F9FA420F41734C7BD977BBF5BDE6B9AA689801B4918CADE819C1387849833DAD97ED75D2671240106EEFB5CC8DAFFB44B24A2161BAE744FE24EDC3215A55B8EFC921FBE78E7931D1FE4CCF5DFFCF2A6FFD39F1A35CFBCF0C803AF62C1B6BB7FED88BD7AE08D11993D8D75B93A0999BC3CA89922A414E64ADF66763AB3DF7F7557D96565BFF766E78C58A97BBEC92717006452D3CBBACA8100805882AFCDCB81
If we know add crawled propertys to a managed propertys the PropertyBlob chaned1. So were are this informations stored used to generate the updated Property Blob? (The update happens while fullcrawling the content source) The answer is the table “dbo.MSSDocProps“ in the „Search_Service_Application_PropertyStoreDB“.

Lets have a look using the DocId 2056 from the call above:
(for a better undestrand I alsoadd the “FriendlyName” column from the table dbo.MSSManagedProperties to the query )

USE Search_Service_Application_PropertyStoreDB_b506dce49c514f8899ae51e503889885
SELECT v2.FriendlyName,
         v1.*
FROM dbo.MSSDocProps v1, Search_Service_Application_DB_dd13ba19a7bb4ffaafcc3e626e73c949.dbo.MSSManagedProperties v2
WHERE DocId = 2056
and v1.PID = v2.PID
Now I map some crawled property to managed properties:
After a full crawled I execute the query again:
In case of the “LOB2Date” and the “LOB2Cur” values you can see that the data is in the IIVal column. The “LOB2Bez” value is clear text in the strVal2 column. LOB2Date is a cryptic datetime value based on the datatype DateTime Structure. LOB2Cur is a decimal value.
1The updated PropertyBlob now contains the following:
0x020400EEFFFE7F9A0100009B0100009D010000BD0010000F000800008002400100014078DAC590314BC36018841F639A96125408A50E0E19BA65B198A14310A4140A828242A0434B054104ADAE6EDF3F4F9FA420F41734C7BD977BBF5BDE6B9AA689801B4918CADE819C1387849833DAD97ED75D2671240106EEFB5CC8DAFFB44B24A2161BAE744FE24EDC3215A55B8EFC921FBE78E7931D1FE4CCF5DFFCF2A6FFD39F1A35CFBCF0C803AF62C1B6BB7FED88BD7AE08D11993D8D75B93A0999BC3CA89922A414E64ADF66763AB3DF7F7557D96565BFF766E78C58A97BBEC92717006452D3CBBACA8100805882AFCDCB8148006F006C00640069006E006700200043006F006D00700061006E007900E07AB02600000000
So we see all the content contained in managed metadata is stored in the Search_Service_Application_PropertyStoreDB“ database, even the data is coming from an extern BCS source. This is intresting for some security issues  (there will be a separate post about this soon)and also for SQL Server maintainence and index defragmentaion. There is a very good article about this availibel on blogs.msdn: LINK
See the complete post inc. the hands-on lab as webcaste here:


Dienstag, 7. Februar 2012

SharePoint Search – a look behind the scene Part II

Part II: SharePoint Search and BCS
(...which data is stored in the SharePoint Search databases?)

Based on the first post “SharePoint Search – a look behind the scene Part I” we will know have a closer look to what happened in the SQL Server if a search request is generated (in this case against a BCS source).

In this example I used an already existing BCS source in my DEMO environment described here: LINK

In the SharePoint Search Center I used the search text “DVD”

With the SQL Server Profiler I can filter out the query SharePoint fires against the SQL Server:

exec sp_executesql
N'/* {00C1C222-BB31-408A-815F-C3A76E85C290} */
exec dbo.proc_MSS_GetMultipleResults @P1,@P2,@P3,@P4,@P5',
N'@P1 int,
@P2 nvarchar(4000),
@P3 nvarchar(4000),
@P4 nvarchar(4000),
@P5 varbinary(8000)',-2147483647,N'dvd',N'',N'SET NOCOUNT ON ;
DECLARE @joinRows INT ;
1.  SET @joinRows = DATALENGTH(@joinData) / 8 ;;
2.  WITH DocIds(DocId, Value) AS (
3.  SELECT TOP(@joinRows) CAST(SUBSTRING(@joinData,
((ord.n-1)*8) + 1, 4) AS INT), CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 5, 4) AS INT)
FROM dbo.MSSOrdinal AS ord WITH(NOLOCK) WHERE ord.n <= @joinRows )

    4. SELECT P.DocId, P.SummaryBlobSize, P.Size, P.LastModified, P.IsDocument, P.IsPictureUrl, P.Author, P.Title, P.Url,
P.PictureThumbnailUrl, P.ContentClass, P.FileExtension, P.Tags, P.PropertyBlob,
CASE WHEN P.PropertyBlob IS NULL THEN 0 ELSE DATALENGTH(P.PropertyBlob) END,
P.PopularSocialTags, P.SiteName, P.Description, P.ParentLink, P.NumberOfMembers, P.PictureHeightAndWidth, P.DisplayDate
FROM dbo.MSSDocResults AS P WITH(NOLOCK), DocIds AS T WHERE P.DocId = T.DocId OPTION (MAXDOP 1) ',0x00001592000000000000...700000031


So let’s see what exactly happens by disassemble that query call.
We can see that the Stored Procedure dbo.proc_MSS_GetMultipleResults is called. But we want to go just another step deeper and find out what’s behind this call. (The Stored Procedure dbo.proc_MSS_GetMultipleResults will be part of one of the next posts.)
First of all the DATALENGTH of the generated BLOB witch is stored in @joinData (details for the BLOB data can be found here: LINK) is used to set the variable @joinRows:
1.       SET @joinRows = DATALENGTH(@joinData) / 8
Let’s see what the result is using this query:
                SELECT DATALENGTH(0x000015920000…700000031) / 8

The result is “50

In the next step a temporary result-set called “DocIds” is created using the “WITH” SQL statement.
2.       WITH DocIds(DocId, Value) AS (…
The call generates a result-set looking like this:
In the first column we see the DocIDs we will need in the next step. But let’s have a look on how this result is generated. The point in the query is this one
1.       SELECT TOP(@joinRows)
CAST(SUBSTRING(@joinData,((ord.n-1)*8) + 1, 4) AS INT),
CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 5, AS INT)
FROM dbo.MSSOrdinal AS ord WITH(NOLOCK) WHERE ord.n <= @joinRows
Using the CAST(SUBSTRING(@joinData,((ord.n-1)*8) + 1, 4) AS INT)on the BLOB data stored in the @joinData will fillter out a list of item identifiers and their rank contained in an Id value pair described also here LINK. This is the only magic creating the result-set showen in step 2.
Next is an easy join using the tempory result-set “DocIds” and the contend of the table “dbo.MSSDocResults”. The join is done with the part "WHERE P.DocId = T.DocId" showen in the query below:
2.       SELECT P.DocId, P.SummaryBlobSize, P.Size, P.LastModified, P.IsDocument, P.IsPictureUrl, P.Author, P.Title, P.Url, P.PictureThumbnailUrl, P.ContentClass, P.FileExtension, P.Tags, P.PropertyBlob,
CASE WHEN P.PropertyBlob IS NULL
THEN 0
ELSE DATALENGTH(P.PropertyBlob)
END,
P.PopularSocialTags, P.SiteName, P.Description, P.ParentLink, P.NumberOfMembers, P.PictureHeightAndWidth, P.DisplayDate
FROM dbo.MSSDocResults AS P WITH(NOLOCK),
DocIds AS T WHERE P.DocId = T.DocId OPTION (MAXDOP 1)
The result is:
So we see that the results given back from SQL Server contains all the data needed even the data is coming from an extern BCS source. The given metadata like “name”, “description” etc. is stored in the PropertyBlog field explained in the first post.
If we now call the ProfilePage of the BCS source all data fields witch are configured are needed. This result in call against the data source defined in the BCS model. In my case this is also an SQL Server call because my External Data Source is a SQL Server:
Captured with SQL Profiler it looks like this:
exec sp_executesql N'SELECT [ProductKey] , [ProductName] , [ProductDescription] , [BrandName] , [ClassName] , [ColorName] , [UnitPrice] , [ProductSubcategoryName] , [ProductCategoryDescription] , [Expr1] FROM [dbo].[V_ContosoRetailDW_ProductSales] WHERE [ProductKey] = @ProductKey',N'@ProductKey int',@ProductKey=1642






and the result similar to what we can see at the ProfilePage:

See the complete post inc. the hands-on lab as webcaste here: