Hello murmurhash-net

I just completed my implementation of the MurmurHash 3 algorithm to .NET and you can find it on GitHub. MurmurHash is a non-crpytographic hash that has good speed and distribution characteristics.

When to use MurmurHash

Now, I need to stress again that MurmurHash is not a cryptographic hash algorithm. You should not use MurmurHash someplace where collisions can cause a denial of service scenario or where you are trying to keep the hashed value a secret. However, there are likely many places you could leverage a hash algorithm where neither of those apply.

So, why use MurmurHash instead of SHA1 or MD5 or something similar? Well, primarily because it's fast. Realistically you should see MurmurHash being between 4x and 10x faster than SHA1 or MD5. While you may not be hashing millions of items in a batch, that was actually the situation we found ourselves in.

MurmurHash comes in two variants: 32-bit and 128-bit. Depending on what you are doing, another reason to use MurmurHash could be the fact that it's smaller than most other comparable algorithms. Being the same size as MD5, you should consider using MurmurHash anywhere you might consider using MD5.

Other thoughts

I haven't decided if it's a clever idea or a stupid idea, but since the larger MurmurHash variant returns a 128-bit value, you can easily create a Guid from it. It is a bit more convenient to work with a Guid rather than an array of bytes so I actually think it could be a good idea. On the other hand, people expect Guid's to be unique so it could be very surprising when two different items hash to the same value.

So if you have need for a fast hash algorithm that doesn't need to be secure, give MurmurHash a try. You can also find the code on NuGet in unsigned and signed packages

When I played soccer in high school there is one thing that the coaches would constantly remind us of during any drill: "just finish. It doesn't matter how long it takes you or how badly you do it; just finish it."

After a recent interview, I found myself repeating this to myself over and over after I had run out of time on the question. In my heart, I knew that the interview question doesn't matter, merely the process that you go through , but it was bothering me that I had left the problem in an unsolved and broken state.

I decided to finish it and send off the solution to the interviewer. I did not think it would help me out any, nor was that my intention, but it felt so much better once I knew in my heart that not only could I solve the problem, but that I had actually done it. When slogging through interviews, it's the little victories that matter the most.

While I never thought about it outside of soccer, the idea should be applied to any activity that you set out to complete. Whether you are running marathons, practicing code kata's, reading a book, or learning a new language: It doesn't matter how long it takes you or how badly you do it; just finish it.

I was just informed in a very polite way that I would not be extended a job by the company I had interviewed with yesterday. I already knew it was going to happen, but I really felt it when the actual email arrived. Below is my post-mortem analysis of what happened. I hope that you can learn from my mistakes as I have.

The problem

Ultimately, I choked. I was presented a CS 101 problem, and I floundered. When I reflect back on what happened, it actually didn't start out too bad. I started thinking on how to solve the problem, started typing a bit, and then realized what I had typed wouldn't work. I don't know why this was a problem, but for some reason the fact that I had made a mistake and would have to correct that caused me to panic.

It's hard to describe the feeling I felt, but it felt like the same feeling I had when I clipped the bike tire of a kid who rode out in front of my vehicle out of an alley a couple of years ago. Fortunately, the kid only had a couple of bruises and a few scratches and a bent tire on his bike. Our mental state, on the other hand, was in far worse condition. I had so many thoughts racing through my head of what I was supposed to do in the situation like ask where he hurt, make sure he didn't try to move too much if there were back injuries, etc. I knew what to do, but all I could actually seem to get my body to do was ask him over and over if he was OK.

It wasn't until I took my hands off the keyboard and grabbed a piece of paper and started trying to solve the problem there that I caught my first break. I am not sure what about this helped, because what I wrote down was just as non-sensical as the things I was typing, but perhaps it gave my brain enough time to re-assert some control over my body to get the message through that I had been looking for, which was what I needed to do. This got me back on track and I immediately got the problem 90% of the way done, but the damage was already done.

The same thing happened on the next problem, but this time it was because I had already shut down. All I could focus on was that I had completely failed the first problem.

Keep Calm and Carry On

As I look back on my botched interview I realized that there were some areas that could have changed the outcome of the interview. Some are things I could have done; some are things I should have known.

Trust Yourself

I distinctly remember one thought that kept occurring to me multiple times during the interview from my brain that could have turned everything around. I kept getting this thought that I should stop trying to solve the problem, collect myself and slow my breathing, then start from the beginning. If I had done this and told the interviewer how I was feeling, I think I could have avoided all the repeat problems because I would have stopped focussing on what had happened and focus on what was next.

Talk, damnit!

I think I was destined for failure before I even started the interview because I was already slightly anxious, and totally missed the key instruction that was given to me: to talk about what I was doing because the problem didn't matter, what they were interested in was how I approached the problem. If the problem didn't matter, then beating myself up over making minor mistakes became the biggest mistake I made in the entire interview. Once I started to panic, I don't know if I actually was capable of talking anymore, which is why I should have collected myself and calmed down so that I could do what I was there to do.

The interviewer is your friend

By far this is the most important thing I learned from looking back on the experience. I feel a bit foolish that I wasn't thinking this way before, but the person interviewing you is on your team and is rooting for you. The best thing that can happen for them is that you end up doing great in the interview and become a great member on their team. Sure, they are looking out for themselves and don't want to bring someone on who isn't going to meet their needs, but until they decide that you won't work out, you should believe that they are on your side. Unfortunately for me, I walked into the interview with the wrong mindset: I felt like I had to be the best that they had ever seen.

Keep It Simple, Stupid

On one problem, I had it mostly solved in my head, but when I began to actually implement my solution I stopped and went for another solution because I felt like it was too simple. I knew in my head that I could do this with one statement and didn't need to allocate an array and I got hung up on this notion. Yes, it is possible to do it in one statement, because I did it after the interview, but I was not able to do it without looking up the names and syntax for the methods that allowed me to do it in one statement. By feeling like I had to make an impressive solution, I committed myself to no solution.

Moving Forward

Even before being told that I would not be hired, I was already beating myself up a bit because I was fairly confident I was not going to be hired. I actually felt as though I was not good enough for the position, and that I had let myself down. Only one of those was true, and I'm solving the other right now. I've decided that even though I won't be hired, that doesn't say anything more than that. Just like a bad piece of code, I just need to fix the bug, make sure it doesn't happen again, and move on. For me, that means putting myself "out there" more. Looking back at this interview, I clearly have some issues when coding in front of others, so I'm going to try to do some coding competitions to practice for the next time I have to do this.

 Ultimately, if you find yourself in a similar situation, just remember to focus on what's important, trust in yourself, and if things don't work out, learn from it and move on.

Caution: what you are about to read is risky. If you do decide to go this route, make sure to do a lot of testing!

I recently tracked some intermittent pausing on our website to the ASPState database waiting on the WRITELOG command in the database. This was puzzling because we don't really use session state very heavily and we definitely weren't hitting I/O bottlenecks. I opened up profiler and started looking into what was going on. Immediately I found some areas that could use improvement:

  1. TempGetStateItemExclusive3 had an eager-spool operator in it's execution plan.
  2. Excessive updating of indexed columns
  3. Updating even when nothing was being changed

Sometimes clever hurts

Without testing, I too would have thought that reading and updating in one operation would provide the best performance, but the query analyzer thinks differently. Sirens started sounding in my head when I looked at the execution plan for the TempGetStateItem3 and TempGetStateItemExclusive3 procedures and saw an eager-spool happening for what should be a simple index seek operation. According to the documentation, an eager-spool will write all of the input data to a temporary table. While this should only be one row, writing the data to a temporary table is unnecessary and also causes additional I/O to the tempdb.

Documentation is light on why the query analyzer would choose to do an eager spool, but the analyzer is picking it due to the fact that the update is storing the values in variables, updating the values, and using the values to determine what other values should be all within the same statement. For everything to work right, order is very important, which is why SQL Server is choosing to get all the input data into a table before going through and doing the updates to avoid the halloween problem.

The next thing I noticed is that in the TempGetStateItemExclusive3 you see a side effect of using the UPDATE statement as a SELECT statement: copious amounts of CASE statements that conditionally modify the data only when we are actually locking the session or not. Almost every field follows the logic if locked then set to this value, otherwise leave the original value. While clever and works, this has some unfortunate side effects:

  1. The Expires column is always updated, which is indexed
  2. Even though we aren't changing values, we are still writing the values (I think, feel free to correct me if I'm wrong)

Let's break things

I've tried to make the procedures work as close to the originals as possible, though there are places where I deviate for performance reasons where I find the trade-off acceptable.

Let's start off with TempResetTimeout procedure since it's the simplest and safest to modify (it also happens to be executed the most often). All we need to do is add a WHERE clause on the update statement that prevents the expiration being updated as frequently.

ALTER PROCEDURE [dbo].[TempResetTimeout]
    @id tSessionId
AS
    UPDATE [ASPState].dbo.ASPStateTempSessions
    SET Expires = DATEADD(n, Timeout, GETUTCDATE())
    WHERE SessionId = @id AND DATEDIFF(minute, @now, Expires) < (Timeout - 1);

Notice the database name in the update statement. If you are using a custom state database, it will be different.

Now, we'll only update the index and write the updated values at most once per minute. The risk here is that someone may have their session timeout potentially one minute earlier than expected, and I am completely OK with this trade-off. You can tweak the interval, 1 minute is just a safe default. The major benefits here are both less index updating and less writing (this procedure is fired at the end of every request using session I believe).

Next we have the procedures that update the session state data: TempUpdateStateItemLong, TempUpdateStateItemLongNullShort, TempUpdateStateItemShort, and TempUpdateStateItemShortNullLong. The most we can do here is avoid unnecessary index updates on the Expires column. Here, we just need to conditionally update the Expires column.

SET Expires = IIF(
    DATEDIFF(minute, GETUTCDATE(), Expires) < (@timeout - 1), 
    DATEADD(minute,@timeout,GETUTCDATE()), 
    Expires
)

Finally, we get to the expensive procedures: TempGetStateItem3 and TempGetStateItemExclusive3. This procedures are almost identical, the primary difference being that one locks the session while the other just reads the state (i.e. when the handler uses read-only session state). There's a lot going on here, so I'll just post the code and then we'll go through it section by section.

ALTER PROCEDURE [dbo].[TempGetStateItemExclusive3]
    @id         tSessionId,
    @itemShort  tSessionItemShort OUTPUT,
    @locked     bit OUTPUT,
    @lockAge    int OUTPUT,
    @lockCookie int OUTPUT,
    @actionFlags int OUTPUT
AS

DECLARE @ptr AS tTextPtr
    , @length AS int
    , @now AS datetime = GETUTCDATE()
    , @nowLocal AS datetime = GETDATE();

SELECT @locked = Locked
    , @lockAge = IIF(Locked = 0, 0, DATEDIFF(second,LockDate,@now))
    , @lockCookie = IIF(Locked = 0, LockCookie + 1, LockCookie)
    , @actionFlags = Flags
    , @itemShort = IIF(Locked = 0, SessionItemShort, NULL)
    , @length = IIF(Locked = 0, DATALENGTH(SessionItemLong), NULL)
    , @ptr = IIF(Locked = 0, TEXTPTR(SessionItemLong), NULL)
FROM [ASPState].dbo.ASPStateTempSessions WITH(ROWLOCK,UPDLOCK)
WHERE SessionId = @id;

-- initialization
IF ((@actionFlags & 1) <> 0)
BEGIN
    UPDATE [ASPState].dbo.ASPStateTempSessions
    SET Flags = Flags & ~1
    WHERE SessionId = @id;

    -- signal to initialize
    SET @actionFlags = 1;
END

-- lock and read
IF (@locked = 0)
BEGIN
    UPDATE [ASPState].dbo.ASPStateTempSessions
        SET Expires = IIF(
            DATEDIFF(minute,@now,Expires) < ([Timeout] - 1), 
            DATEADD(minute,[Timeout],@now), 
            Expires
        ),
        Locked = 1,
        LockDate = @now,
        LockDateLocal = @nowLocal,
        LockCookie = @lockCookie
    WHERE SessionId = @id;

    IF (@length IS NOT NULL)
    BEGIN
        READTEXT [ASPState].dbo.ASPStateTempSessions.SessionItemLong @ptr 0 @length;
    END
END

The way the original code worked, your session would be extended if the session could be locked as it attempts to lock the session repeatedly until it actually acquires the lock. This is a breaking change from the original, but if you are using session on pages that take a long time, I would suggest looking into getting rid of session usage on those pages.

To start with, we grab our session values in the select statement. Before, all of this was done in the update statement, and thus caused the eager-spool problem. The IIF statements make it so that we only grab the values if the session is unlocked, heck the "exclusive" aspect of the procedure. Using a SELECT statement instead of an UPDATE statement has the drawback of possibly seeing the session is unlocked and someone locking it before we can, which means that we need to hold a lock on the record in case we end up locking the session, and this is accomplished with the WITH(UPDLOCK) statement.

Next, we get to some initialization stuff that it does. Basically, if the Flags value is 1, set it to 0 and return 1 in the @actionFlags parameter. It's still doing the bitwise operations like before, but I don't think the values are ever different than 0 or 1.

The last part is straightforward. If we weren't locked, lock the session and read out the value in the SessionItemLong if there was one. A couple of things I want to point out: we are conditionally updating the value of the Expires column as before, and we are updating the LockCookie column with the @lockCookie parameter value, which has an incremented value from the original SELECT statement.

Wrapping up

So how much does this all help? Well, with a simple benchmark using 8 active sessions, running 4 concurrent requests per session I saw ~15% total throughput. The session was being updated everytime, so throughput should increase even more with normal usage scenarios.

Feel free to tweak the changes where you see fit, you should have a decent understanding of the overall flow and requirements now. You can find all the scripts necessary on this gist on GitHub. The only other suggestion I have is that you make sure your session state database's recovery model is set to simple.

Some people will never learn anything, for this reason, because they understand everything too soon.

Alexander Pope

Recently I was working on adding support for sending notifications to mobile devices and came across a cool feature in SQL Server that I didn't know about. I won't go into the details, but what I doing was basically a producer-consumer problem but distributed across multiple machines. When I started, I knew exactly what to do; when I finished, I had learned something new.

First Attempt

Starting out, I already knew exactly what I needed to do and how to do it. The SQL was very straight forward.

UPDATE dbo.NotificationQueue
SET IsActive = 0,
    LockedUntilUtc = DATEADD(SECOND, 10, SYSUTCDATETIME()),
    LockedBy = @MachineName
WHERE Id IN (
    SELECT Id
    FROM dbo.NotificationQueue q
    WHERE q.IsActive = 1
    ORDER BY q.CreatedOnUtc 
        OFFSET 0 ROWS 
        FETCH FIRST @BatchSize ROWS ONLY
);

SELECT Id, TargetApplication, SystemType, IsDevelopment, AlertMessage, Payload, Token
FROM dbo.NotificationQueue
WHERE LockedBy = @MachineName;

There are problems with this approach though.

  • It requires multiple operations (one to find the items to update, one to update, and one to select the items that are now locked)
  • You need multiple indexes (one for looking up the items that are active, one to look up who has locked the notification)

Refactoring

I don't remember exactly how I ended up on the documentation for UPDATE statements exactly, but I noticed something I haven't ever payed attention to before. Towards the bottom of the page was the section "Capturing the Results of the UPDATE Statement".

How this works is very similar to how triggers work: you get fake tables called inserted and deleted which hold the values that are modified by the query. The inserted table holds the new values for the records. The deleted table holds the previous values for the records.

So, this allows us to remove our second query we had above that was just retrieving the information about our notification. I also ended up using another trick that I had run across previously which is that you can update using a Common Table Expression. This doesn't change the efficiency of the query, I just find it easier to read.

WITH pending AS (
    SELECT *
    FROM dbo.NotificationQueue q
    WHERE q.IsActive = 1
    ORDER BY q.CreatedOnUtc OFFSET 0 ROWS FETCH FIRST @BatchSize ROWS ONLY
)
UPDATE pending
SET IsActive = 0,
    LockedUntilUtc = DATEADD(SECOND, 10, SYSUTCDATETIME()),
    LockedBy = @MachineName
OUTPUT inserted.Id
    , inserted.TargetApplication, inserted.SystemType, inserted.IsDevelopment
    , inserted.AlertMessage, inserted.Payload, inserted.Token;

Now that I know about this, I can think of a lot of other situations where it will be extremely helpful. This output clause isn't just limited to UPDATE statements either; it's supported on INSERT and DELETE as well.

If you aren't using SQL Server, there may be an equivalent of this technique supported by your database. The RETURNING clause is the most common variant and it's supported by PostgreSQL and Oracle.