?

Log in

No account? Create an account
Evil, but cute & gay about it
...ramblings of the imperfectly innocent
Ups and downs 
14th-Nov-2010 02:10 pm
southpark
I don't think I've ever had a morning as full of highs and lows as today. In the ever popular YAY/BOO format:
  • YAY: Saw seleniumdream and his wife this morning at brunch. He is recovering from surgery, and they are both doing very well.
  • BOO: The Drunken French Toast at Costal Kitchen was not nearly as good as it needed to be.
  • YAY: The eggs and bacon definitely were, though.
  • BOO: The coffee cake was raisin apple. Uh, no.
  • BOO: The coffee itself was subpar today.
  • YAY: Since it wasn't raining, I decided to talk a long walk home instead of the bus. I found a very nice local bakery I'd never noticed. They have quite decent whoopie pies. Yum.
  • YAY: On the way home, I stopped at Vivace for some good coffee (possibly the best in Seattle, in fact). It was lovely.
  • YAY: While waiting in line, mrdorbin, his husband, and their son walked in! Whee!
  • BOO: While drinking my coffee, I learned that I've been scooped! ARGH! I had some awesome results showing two serious weaknesses in Murmurhash 2, and I've been (excruciatingly slowly) writing them up, and it turns out that the author has posted about one of them on Nov 4! CURSES! Ah, well, Murmurhash 3 is in the works, and I plan on emailing my comments to the author, so maybe things can be fixed there. But I was soooo hoping for a tiny slice of e-fame!
  • YAY: Remembered to stop by LUSH on the way home for more shaving cream.
  • BOO: LUSH has discontinued the shaving cream that I use! NOOOOOO!
  • YAY: They appear to have reformulated their Prince shaving cream into something I might like. Bought it instead.
  • BOO: Nordstrom Rack (one of the few places I can find shows that fit) had no shoes that I liked.

And the funnest YAY is for last. While I was on my way out of Nordstrom, I noticed a woman holding a bag from Penzeys Spices. I was confused, because we don't have one around here. A quick search on my phone showed that I was wrong: WE TOTALLY HAVE A PENZEY'S AROUND HERE NOW OMG! It opened two days ago, apparently, just across the street from Nordstrom Rack! I know I shouldn't be such a label whore for them, especially since we have the awesome World Spice on the other side of the Market, but I am. WOOOOOO!
Comments 
15th-Nov-2010 02:50 am (UTC)
total YAY for stealth penzey's acquisition!
16th-Nov-2010 04:53 am (UTC)
I know, right?!? I walked by the locations several times over the past weeks, and there was no hint! It's like nothing... nothing... *WHOOMPH* awesomeness!
15th-Nov-2010 09:13 am (UTC)
BOO: Nordstrom Rack (one of the few places I can find shows that fit) had no shoes that I liked.

If the show fits....

especially since we have the awesome World Spice on the other side of the Market

Competition is good. That being said, I suspect I'll still get most of my spices from World Spice. :)
16th-Nov-2010 04:53 am (UTC)
Har! Yes, I will continue to shop there, but now I will be torn.
16th-Nov-2010 07:51 pm (UTC)
Ok, there were so many ups and downs in that one weekend that even I feel dizzy now. :)

I did not know that Nordstrom Rack had shows. Really? :p

I so loved that banana nut and chocolate chip coffee cake last time--no to the apple and raisin.

What was the bakery you found? Near me?
17th-Nov-2010 06:26 am (UTC)
Again, har.

The bakery was sort-of near you; it's North Hill Bakery on 15th. It's a little small, and not everything looked awsome, but as I said, the whoopie pie was good, and the scone was even better than expected. I'll definitely try them again.
26th-Nov-2010 06:47 am (UTC)
"BOO: While drinking my coffee, I learned that I've been scooped! ARGH! I had some awesome results showing two serious weaknesses in Murmurhash 2, and I've been (excruciatingly slowly) writing them up, and it turns out that the author has posted about one of them on Nov 4! CURSES! Ah, well, Murmurhash 3 is in the works, and I plan on emailing my comments to the author, so maybe things can be fixed there. But I was soooo hoping for a tiny slice of e-fame!"

What was the other weakness?

-tanjent, author of MurmurHash
30th-Nov-2010 06:46 am (UTC)
Wow, hi! I'll send you email with all the details (and some other ideas I've had bubbling around, if you're interested) in the next few days (Thursday at the latest), but basically:
  • The weakness you described with files full of repeated values is actually worse (or at least deeper) than you described. There could be a bug in my code, but it looks like a file of certain sizes that is just a repeated value has the rightmost (least significant) bits of the seed as the hash value (note that this *excludes* final processing; I'm concerned about the core hash transformation itself). For example, Murmurhash2 against a file of 64K words of 0x2d2d2d2d and a seed of 0x12345678 has a hash of 0x63e65678. Against 128K, it's 0x96985678 (again, this is without the final processing). With smaller files, the number of matching bits is smaller, but it's still there. Further, it doesn't have to be all the same word; if I set a random single word to a different value, sometimes it happens more quickly, and sometimes less.
  • The other problem that I found is a little hard to describe. It's about ways of causing hash collisions, but my concern isn't that I can cause a collision (outside of a fully cryptographic function, that's nearly expected), but rather that I can do so (a) by only changing two aligned words in the file, and (b) without knowing or caring what the other data in the file is.
  • This same issue extends to things like if two aligned words in a file are both inverted (a = ~a), there's something like a 2**-19 chance of a collision, instead of the expected 2**-32. That number is likely wrong, since it's from memory, but basically there's some chance (guesstimated here at 2**-12) that those 2 deltas each only affect the top 7 (IIRC) bits of the chaining value, and since those deltas never affect lower bits, there's a 2**-7 chance that they'll cancel each other out, leaving the hash unchanged.


So, while I'm definitely impressed with MH2, and your changes so far in MH3, I think that having your chaining function simply be "h *= M;" is the real problem. IMHO, deltas in the high bits absolutely need to somehow propagate to lower bits, whether via another "h ^= h >> K;", or some other mechanism. I know that this sort of thing can kill the pipeline and thus performance, but I really think it's necessary (it would, AFAICT, fix all of the above weaknesses). Maybe something like "a = f(in[idx++]); h1 *= M; h1 ^= a; h2 ^= h2 >> K; a = f(in[idx++]); h2 *= M; h2 ^= a; h1 ^= h1 >> K;", where you actually have two chaining values to take advantage of pipelining, and then combine them at the end? IDK. You've done some amazing stuff so far; I'm sure you'll come up with something.

Thanks for taking an interest!
30th-Nov-2010 07:46 am (UTC)
Hmm, I might put together a quick test to check points #1 and #3 above.

I've gotten the "high bits need to propagate to low bits" feedback from multiple people, and the best response I have is that it's not as necessary as you think (and I've tested a bunch of stuff) as long as each key block is sufficiently mixed before getting xored into the hash state and (and this is the part that Murmur2 failed at) that each block is mixed _differently_.

The whys of that are rather long to be put in a LJ comment, but I'll see about doing a write-up of that later.

Very short version - if F1/F2/F3 are different random oracles and B1/B2/B3 are key blocks, then F1(B1)^F2(B2)^F3(B3), F3(F2(F1(B1)^B2)^B3), and F1(F1(F1(B1)^B2)^B3) are all equivalently good hash functions. When F1/F2/F3/FN are not random oracles things are trickier, but the situation is more complex than "high bits of H must affect low bits".

-tanjent
30th-Nov-2010 06:09 pm (UTC)
Here's snippets of my quick hacky code I wrote when I found point 1:

unsigned int src[1024*1024];

h = 0x12345678;
memset(&src[0], 0x2d, 1024*1024*sizeof(unsigned int));
for (i=0; i<1024; i++) {
for (j=0; j<1024*64; j++) {
a = src[j];

a *= 0x5bd1e995;
a ^= a >> 24;
a *= 0x5bd1e995;


h *= 0x5bd1e995;
h ^= a;
}
printf("h:%08x\n", h);
}


Output:

h:63e65678
h:96985678
h:6cca5678
h:f47c5678
h:b9ae5678
h:ab605678
h:69525678
h:c1045678
h:f6f65678


With larger "files" (outer loop goes to 32768), the hash matches the seed exactly. This is even true if I fill the array with different values (e.g. 0x4f). Some values will match the seed earlier (0xa5 matches at 16384 outer loops).

I will ponder your other points, and I agree that the situation is more complex than that. However, I don't think sufficient mixing of the input block and different mixing are sufficient, unless that mixing is somehow dependent on previous *data*. For example, in your current MH3 version, if I'm reading the code right then I can precompute each block's multiplicative values ahead of time (since they only depend on block position), so I can induce a difference of (e.g.) 0x80000000 in the chained hash value, which will get carried on with (in this case) p=1, regardless of how many other blocks I hash, so a second single block change can be used to negate that, causing a collision which does not depend on the values of any other blocks. Other high-bit differences get carried on with different probabilities. This property makes me queasy. Again, my concern isn't that I can deliberately and trivially compute collisions, it's that I can do so while not caring about 99% of the file's contents.
This page was loaded Mar 21st 2019, 2:19 pm GMT.