Friday, September 15, 2017

First MVC web application installation issue

1. Parser Error Message: Unrecognized attribute 'targetFramework'. Note that attribute names are case-sensitive.
Solution:  run cmd.exe in administrator mode
C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319>aspnet_regiis -i
2. Access is denied.
Description: An error occurred while accessing the resources required to serve this request. The server may not be configured for access to the requested URL
Error message 401.2.: Unauthorized: Logon failed due to server configuration
Solution: allow "Anonymous Authentication" and "Form Authentication" or others

3. Login failed for user 'TDBFG\TDL08120509$'.
Solution: Application Pool Identity corresponds to security of your website whereas running your website under IIS. Application Pool Identity runs your website unique account. Network service is IIS default identity, it doesn't need password so caused low prevailed. You'll reconfigure IIS Application Pool Identities as per your specific demand.
http://www.technologycrowds.com/2013/03/login-failed-for-user-iis.html


4. The server cannot handle directory requests.  return pc.ValidateCredentials(userName, password);
The following thread has a similar issue, which turned out that IIS on the server was set up to run as LocalMachine,  please change it to NetworkService (the default).
http://stackoverflow.com/questions/3694919/nets-directory-services-throws-a-strange-exception


5. Also need to give permission to NetWorkService Account in SSMS (for above step #4)

Also remember:
  1. use ASP.Net v4.0 application pool

Thursday, September 14, 2017

The prime number challenge – great waste of time!

No sane person would even consider using SQL Server to construct a list of prime numbers. So just to prove that I’m not sane (as if there could be any doubt!), this post will be about finding prime numbers.

First a bit of history. Ward Pond wrote about efficient ways to populate a table with one million GUIDs. I posted a comment with a slightly more efficient algorithm. And that was quickly followed by a new post from Ward, tweaking my syntax even further. And that’s when Denis the SQL Menace lived true to his name by posting this comment:

“How about the next challenge is to return all 78498 prime numbers between 1 and 1000000?”

Now of course, this is a silly challenge. Not because prime numbers are silly, mind you. They are very useful to mathematicians, and many encryption algorithms wouldn’t even be possible without (large) prime numbers. The silly part is using SQL Server, a data manipulation tool, to calculate prime numbers. If you really need them, code a quick algorithm in a C++ program. Or buy a ready-made list with the first million or so prime numbers. So I attempted to resist the challenge.

Alas – the flesh is weak. So when I saw Ward’s reply to Denis’ challenge, I was no longer able to resist temptation. After all, Ward’s attempt is not only interesting – it is also very long, and apparently (with an estimated completion time of 1 to 2 days!!) not very efficient. I decided that I should be able to outperform that.

My assumptions are that a table of numbers is already available, and that this table holds at least all numbers from 1 to 1,000,000. (Mine holds numbers from 1 to 5,764,801), and that the challenge is to create and populate a table with the prime numbers from 1 to 1,000,000, in as little speed as possible. Displaying the prime numbers is not part of the challenge. For testing purposes, I replaced the upper limit of 1,000,000 with a variable @Limit, and I set this to a mere 10,000. That saves me a lot of idle waiting time!

As a first attempt, I decided to play dumb. Just use one single set-based query that holds the definition of prime number. Here’s the SQL:

DROP TABLE dbo.Primes
go
CREATE TABLE dbo.Primes
           (Prime int NOT NULL PRIMARY KEY)
go
DECLARE @Start datetime, @End datetime
SET     @Start = CURRENT_TIMESTAMP
DECLARE @Limit int
SET     @Limit = 10000

INSERT INTO dbo.Primes (Prime)
SELECT      n1.Number
FROM        dbo.Numbers AS n1
WHERE       n1.Number > 1
AND         n1.Number < @Limit
AND NOT EXISTS
 (SELECT    *
  FROM      dbo.Numbers AS n2
  WHERE     n2.Number > 1
  AND       n2.Number < n1.Number
  AND       n1.Number % n2.Number = 0)

SET     @End = CURRENT_TIMESTAMP
SELECT  @Start AS Start_time, @End AS End_time,
        DATEDIFF(ms, @Start, @End) AS Duration,
        COUNT(*) AS Primes_found, @Limit AS Limit
FROM    dbo.Primes
go
--select * from dbo.Primes
go

This ran in 1,530 ms on my test system. (And in case you ask – I also tested the equivalent query with LEFT JOIN; that took 11,466 ms, so I quickly discarded it). With @Limit set to 20,000 and 40,000, execution times were 5,263 and 18,703 ms – so each time we double @Limit, execution time grows with a factor 3.5. Using this factor, I can estimate an execution time of one to two hours.

That’s a lot better than the one to two days Ward estimates for his version – but not quite fast enough for me. So I decided to try to convert the Sieve of Eratosthenes to T-SQL. This algorithm is known to be both simple and fast for getting a list of prime numbers. Here’s my first attempt:

DROP TABLE dbo.Primes
go
CREATE TABLE dbo.Primes
           (Prime int NOT NULL PRIMARY KEY)
go
DECLARE @Start datetime, @End datetime
SET     @Start = CURRENT_TIMESTAMP
DECLARE @Limit int
SET     @Limit = 10000

-- Initial fill of sieve;
-- filter out the even numbers right from the start.
INSERT  INTO dbo.Primes (Prime)
SELECT  Number
FROM    dbo.Numbers
WHERE  (Number % 2 <> 0 OR Number = 2)
AND     Number <> 1
AND     Number <= @Limit

-- Set @Current to 2, since multiples of 2 have already been processed
DECLARE @Current int
SET     @Current = 2
WHILE   @Current < SQRT(@Limit)
BEGIN
  -- Find next prime to process
  SET @Current =
             (SELECT TOP (1) Prime
              FROM     dbo.Primes
              WHERE    Prime > @Current
              ORDER BY Prime)
  DELETE FROM dbo.Primes
  WHERE       Prime IN (SELECT n.Number * @Current
                        FROM   dbo.Numbers AS n
                        WHERE  n.Number >= @Current
                        AND    n.Number <= @Limit / @Current)
END

SET     @End = CURRENT_TIMESTAMP
SELECT  @Start AS Start_time, @End AS End_time,
        DATEDIFF(ms, @Start, @End) AS Duration,
        COUNT(*) AS Primes_found, @Limit AS Limit
FROM    dbo.Primes
go
--select * from dbo.Primes

The time that Eratosthenes takes to find the prime numbers up to 10,000 is 7,750 ms – much longer than the previous version. My only hope was that the execution time would not increase with a factor of 3.5 when doubling @Limit – and indeed, it didn’t. With @Limit set to 20,000 and 40,000, execution times were 31,126 and 124,923 ms, so the factor has gone up to 4. With @Limit set to 1,000,000, I expect an execution time of 15 to 20 hours.

Time to ditch the sieve? No, not at all. Time to make use of the fact that SQL Server prefers to process whole sets at a time. Let’s look at the algorithm in more detail – after the initial INSERT that fills the sieve and removes the multiples of 2, it finds 3 as the next prime number and removes multiples of 3. It starts removing at 9 (3 squared) – so we can be pretty sure that numbers below 9 that have not yet been removed will never be removed anymore. Why process them one at a time? Why not process them all at once? That’s what the algorithm below does – on the first pass of the WHILE loop, it takes the last processed number (2), finds the first prime after that (3), then uses the square of that number (9) to define the range of numbers in the sieve that are now guaranteed to be primes. It then removes multiples of all primes in that range. And after that, it repeats the operation, this time removing multiples in the range between 11 (first prime after 9) and 121 (11 square). Let’s see how this affects performance.

DROP TABLE dbo.Primes
go
CREATE TABLE dbo.Primes
           (Prime int NOT NULL PRIMARY KEY)
go
DECLARE @Start datetime, @End datetime
SET     @Start = CURRENT_TIMESTAMP
DECLARE @Limit int
SET     @Limit = 10000

-- Initial fill of sieve;
-- filter out the even numbers right from the start.
INSERT  INTO dbo.Primes (Prime)
SELECT  Number
FROM    dbo.Numbers
WHERE  (Number % 2 <> 0 OR Number = 2)
AND     Number <> 1
AND     Number <= @Limit

-- Set @Last to 2, since multiples of 2 have already been processed
DECLARE @First int, @Last int
SET     @Last = 2
WHILE   @Last < SQRT(@Limit)
BEGIN
  -- Find next prime as start of next range
  SET @First =
             (SELECT TOP (1) Prime
              FROM     dbo.Primes
              WHERE    Prime > @Last
              ORDER BY Prime)
  -- Range to process ends at square of starting point
  SET @Last = @First * @First
  DELETE FROM dbo.Primes
  WHERE       Prime IN (SELECT     n.Number * p.Prime
                        FROM       dbo.Primes  AS p
                        INNER JOIN dbo.Numbers AS n
                              ON   n.Number >= p.Prime
                              AND  n.Number <= @Limit / p.Prime
                        WHERE      p.Prime  >= @First
                        AND        p.Prime  <  @Last)
END

SET     @End = CURRENT_TIMESTAMP
SELECT  @Start AS Start_time, @End AS End_time,
        DATEDIFF(ms, @Start, @End) AS Duration,
        COUNT(*) AS Primes_found, @Limit AS Limit
FROM    dbo.Primes
go
--select * from dbo.Primes

The time taken for the 1,229 primes between 1 and 10,000? A mere 266 ms!! With execution times like that, I saw no need to rely on extrapolation – I set @Limit to 1,000,000, hit the Execute button, sat back – and get the following result after 19 seconds:

Start_time              End_time                Duration    Primes_found Limit
----------------------- ----------------------- ----------- ------------ -----------
2006-09-24 00:42:22.780 2006-09-24 00:42:41.750 18970       78498        1000000

From 1-2 days to just under 20 seconds – and this time, I didn’t even have to add an index!

Finally, to top things off, I tried one more thing. I have often read that SQL Server won’t optimize an IN clause as well as an EXISTS clause, especially if the subquery after IN returns a lot of rows – which is definitely the case here. So I rewrote the DELETE statement in the heart of the WHILE loop to read like this:

  DELETE FROM dbo.Primes
  WHERE EXISTS
   (SELECT    *
    FROM       dbo.Primes  AS p
    INNER JOIN dbo.Numbers AS n
          ON   n.Number >= p.Prime
          AND  n.Number <= @Limit / p.Prime
    WHERE      p.Prime  >= @First
    AND        p.Prime  <  @Last
    AND        Primes.Prime = n.Number * p.Prime)

And here are the results:

Start_time              End_time                Duration    Primes_found Limit
----------------------- ----------------------- ----------- ------------ -----------
2006-09-24 00:47:42.797 2006-09-24 00:48:01.903 19106       78498        1000000

Which just goes to prove that you shouldn’t believe everything you read, I guess. 

Database Programming: Set-Based Update — Hugo’s Syntax Outperforms Shaun’s; Ward Tweaks Hugo’s Syntax Even Further

This is turning into a good old-fashioned coding party.  I love it..

Hugo Kornelis left a wonderful comment on last night's post (thanks Hugo!) answering my challenge to outperform Shaun's syntax.  With some adjustments I've made to control for parallelism (we can argue later whether that's a good thing; when I leave parallelism in play on my sandbox (which turns out to be hyperthreaded, contrary to what I said last night) the results of the following tests are far too random), I have some results to share..

I expanded on Hugo's approach and built 5 possible approaches to this problem (if you've got a different one, bring it on..).  Let's name them as follows (along with the average of five execution times on my laptop (HP nc8430; 2gB RAM):


  • 1MRowTable (2156 ms)
  • ShaunsCTE (1546 ms)
  • HugosFourWayJoin (1123 ms)
  • HugosTwoWayJoin (1110 ms)
  • TwoWayJoinNoTOP (983 ms)

Here's a rundown on each alternative..

1MRowTable

Looking at Hugo's syntax, I wondered if we might get more efficiency from a SELECT off of a 1M row table than off of a four-way Cartesian product.  This requires a dedicated table, which we'll populate with an adaptation of Shaun's CTE:

drop table SetBuilder
go

create table SetBuilder (Id bigint PRIMARY KEY)
go

with digits(i) as (
    select 1 as i union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9 union all
    select 0
),
-- generate 1M rows each with a unique row number, i
sequence(i) as (
  select d1.+ (10*d2.i) + (100*d3.i) + (1000*d4.i) + (10000*d5.i) + (100000*d6.i)
      from digits as d1,
           digits as d2,
           digits as d3,
           digits as d4,
           digits as d5,
           digits as d6
)

insert SetBuilder (Id)
select i
from sequence 

Here's the test for generating 1M GUIDs using the table we just populated:

drop table SomeTable;
go
checkpoint;
dbcc freeproccache;
dbcc dropcleanbuffers;
go


declare @start datetime,
          @datediff int;
set @start = getdate();

select newid() as uuid into SomeTable
from SetBuilder
option (maxdop 1);


select @datediff = datediff(ms, @start, getdate());


select @datediff  AS [1MRowTable]count(*)
from SomeTable;

ShaunsCTE

Here's Shaun's CTE blended with Hugo's monitoring syntax:


drop table SomeTable;
go
checkpoint;
dbcc freeproccache;
dbcc dropcleanbuffers;
go


declare @start datetime,
          @datediff int;


set @start = getdate();


with digits(i) as (
    select 1 as i union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9 union all
    select 0
),


-- generate 1M rows each with a unique row number, i


sequence(i) as (
 select d1.+ (10*d2.i) + (100*d3.i) + (1000*d4.i) + (10000*d5.i) + (100000*d6.i)
 from digits as d1digits as d2digits as d3digits as d4digits as d5digits as d6
)


select newid() as uuid into SomeTable
from sequence option (maxdop 1);


select @datediff = datediff(ms, @start, getdate());


select @datediff  AS ShaunsCTEcount(*)
from SomeTable;

HugosFourWayJoin

Here's Hugo's four-way JOIN proposal, appropriate for use in a newly created database:


drop table SomeTable;
go
checkpoint;
dbcc freeproccache;
dbcc dropcleanbuffers;
go


declare @start datetime,
          @datediff int;


set @start = getdate();


select top (1000000) newid() as uuid
into SomeTable
from sysobjects a, sysobjects b, sysobjects c, sysobjects d;
select @datediff = datediff(ms, @start, getdate());
select @datediff  AS HugosFourWayJoincount(*)
from SomeTable option (maxdop 1);

HugosTwoWayJoin

Here's Hugo's two-way JOIN, appropriate for use in databases with several user objects defined:


drop table SomeTable;
go

checkpoint;
dbcc freeproccache;
dbcc dropcleanbuffers;
go


declare @start datetime,
          @datediff int;


set @start = getdate();


select top (1000000) newid() as uuid into SomeTable
from sysobjects a, sysobjects b


select @datediff = datediff(ms, @start, getdate());


select @datediff  AS HugosTwoWayJoincount(*)
from SomeTable option (maxdop 1);

WardsTwoWayJoinNoTOP

Given that the two-way JOIN outperformed the four-way JOIN, I thought we might get better performance if we built a contrived table containing 1000 rows, which would generate an exactly 1M row Cartesian product, which removes the need for the TOP clause in the SELECT:

create table SetBuilder1K (Id bigint PRIMARY KEY)
go

;with digits(i) as (
    select 1 as i union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9 union all
    select 0
),
-- generate 1M rows each with a unique row number, i
sequence(i) as (
  select d1.+ (10*d2.i) + (100*d3.i) + (1000*d4.i) + (10000*d5.i) + (100000*d6.i)
      from digits as d1,
           digits as d2,
           digits as d3,
           digits as d4,
           digits as d5,
           digits as d6
)

insert SetBuilder1K (Id)
select TOP 1000 i
from sequence
order by i

drop table SomeTable;
go

checkpoint;
dbcc freeproccache;
dbcc dropcleanbuffers;

go
declare @start datetime,
          @datediff int;

set @start = getdate();

select newid() as uuid into SomeTable
from SetBuilder1K a, SetBuilder1K b

select @datediff = datediff(ms, @start, getdate());

select @datediff  AS WardsTwoWayJoinNoTOPcount(*)
from SomeTable option (maxdop 1);

Let's Keep This Party Going!

It looks as though the precisely contrived Cartesian product is the most performant of these alternatives.  This is further demonstration, I think, that it's best to let SQL Server be SQL Server; working with tables appears to be more performant than working with CTEs.

This is definitely a work in progress, and I hope those of you who are so inclined will join the party and throw some code on the bonfire.  I'd be especially interested in knowing if anybody can get consistent results with parallelism enabled.

If you've made it this far, thanks..

     -wp

PS.  If you have made it this far, your reward is the name of Hugo's blog, which I think is so brilliant, I'm going to pass it through the SQL query parser:


SELECT  Hints,
        Tips,
        Tricks
FROM    Hugo Kornelis
WHERE   RDBMS = 'SQL Server'