Code comparing various approaches to inserting documents into MongoDB. The sources along with a more detailed description are on GitHub. Here is little chart
showing the throughput of each approach.
More details in README.md
Friday, 22 January 2016
Monday, 28 December 2015
Take that Turing and Church!!!
Halting problem is undecidable ala Dr Seuss:
No program can say what another will do.
Now, I won’t just assert that, I’ll prove it to you:
I will prove that although you might work til you drop,
you can’t predict whether a program will stop.
Imagine we have a procedure called P
that will snoop in the source code of programs to see
there aren’t infinite loops that go round and around;
and P prints the word “Fine!” if no looping is found.
You feed in your code, and the input it needs,
and then P takes them both and it studies and reads
and computes whether things will all end as they should
(as opposed to going loopy the way that they could).
Well, the truth is that P cannot possibly be,
because if you wrote it and gave it to me,
I could use it to set up a logical bind
that would shatter your reason and scramble your mind.
Here’s the trick I would use – and it’s simple to do.
I’d define a procedure – we’ll name the thing Q –
that would take any program and call P (of course!)
to tell if it looped, by reading the source;
And if so, Q would simply print “Loop!” and then stop;
but if no, Q would go right back to the top,
and start off again, looping endlessly back,
til the universe dies and is frozen and black.
And this program called Q wouldn’t stay on the shelf;
I would run it, and (fiendishly) feed it itself.
What behaviour results when I do this with Q?
When it reads its own source, just what will it do?
If P warns of loops, Q will print “Loop!” and quit;
yet P is supposed to speak truly of it.
So if Q’s going to quit, then P should say, “Fine!” –
which will make Q go back to its very first line!
No matter what P would have done, Q will scoop it:
Q uses P’s output to make P look stupid.
If P gets things right then it lies in its tooth;
and if it speaks falsely, it’s telling the truth!
I’ve created a paradox, neat as can be –
and simply by using your putative P.
When you assumed P you stepped into a snare;
Your assumptions have led you right into my lair.
So, how to escape from this logical mess?
I don’t have to tell you; I’m sure you can guess.
By reductio, there cannot possibly be
a procedure that acts like the mythical P.
You can never discover mechanical means
for predicting the acts of computing machines.
It’s something that cannot be done. So we users
must find our own bugs; our computers are losers!
- Geoffrey K. Pullum
Monday, 23 November 2015
Currying in Kotlin
As an example consider summing over a range of integers applying a provided function on each. Something like
fun sigma(range: IntRange, f: (Int) -> Int): Int { var result = 0 for (i in range) result += f(i) return result }and we have
println(sigma(1..5) { x -> x }) // prints: 15 println(sigma(1..5) { x -> x * x }) // prints: 55We would like to to fix f() so that we could define:
val sigmaOfSum = sigma { x -> x } val sigmaOfProduct = sigma { x -> x * x }without committing to a range, such that later on, we can invoke
println(sigmaOfSum(1..5)) // prints: 15 println(sigmaOfProduct(1..5)) // prints: 55This can be done in straight Kotlin:
fun sigma(f: (Int) -> Int): (IntRange) -> Int { fun applyF(range: IntRange): Int { var result = 0 for (i in range) result += f(i) return result } return ::applyF }Here, the (higher order) function sigma takes f() as a parameter and returns a function that applies f() on the provided range. A more down to earth example; suppose we want to perform a DB lookup. In principal, we need specify a key/query and the th DB instance but we want to break these two things apart in the spirit of the first example:
fun lookUp(db: Mapnow the following works:): (String) -> String? { fun dbLookup(key: String): String? { return db.get(key) } return ::dbLookup }
val customerLookup = lookUp(mapOf("1" to "Customer1", "2" to "Customer2")) val productLookup = lookUp(mapOf("1" to "Product1", "2" to "Product2")) println(customerLookup("1")) // prints Customer1 println(productLookup("1")) // prints Product1Our lookUp() function is quite simple and can be collapsed to:
fun lookUp(map: Map): (String) -> String? = { map.get(it) }
Friday, 20 November 2015
Class delegating in Kotlin
The Kotlin language documentation is somewhat patchy and the class delegating functionality is under documented. To see what it does it's best to look at an example:
All that is in the realm of boilerplate stuff and Kotlin has a concise alternative. This is how it looks like:
class Customer() class Product() interface CustomersFinder { fun findCustomers(query: String): Set<Customer> } interface ProductFinder { fun findProducts(query: String): Set<Product> }Now, suppose we want to have a MegaFinder class that is capable of querying both Customers and Products. The standard way to do this is
- pass instances of the finders to the MegaFinder
- have the MegaFinder stash those instances in private properties, and
- call on them as needed
All that is in the realm of boilerplate stuff and Kotlin has a concise alternative. This is how it looks like:
class MegaFinder(customerFinder: CustomerFinder, productFinder: ProductFinder) : CustomerFinder by customerFinder, ProductFinder by productFinder { fun loadAll() { val customers = findCustomers("*:*") val products = findProducts("*:*") //... } }What this says is that MegaFinder implements the finder interfaces by delegating to the designated instances. It is more terse then the traditional approach with two caveats that I see
- The delegating approach requires the MegaFinder to implement the finder interfaces. This is not necessarily bad, but the freedom not to do that is not available
- When reading the code it is not immediately obvious where the findCustomers(...) and findProducts(...) are implemented - it is almost is if you need some sort of IDE to figure this out :-)
Wednesday, 24 June 2015
Traversing LinkedList considered (somewhat) harmful
I compare traversal of LinkedList, ArrayList and an array.
First things first, given the mechanics of linked lists, iterating with a for loop, e.g:
The benchmark measures performance of "for-style" traversal of LinkedList, ArrayList and array as well as "iterator-style" traversal of LinkedList and ArrayList e.g:
The code is available here. The output from JMH is
Comments
for (int i = 0; i < N; i++) { linkedList.get(i)); }is expected to be comparatively slow since the list has to be scanned every time to get() to the i-th element. While sub-optimal, it may still happen in the wild, especially when a call returns a List and the client programmer is not aware of the actual implementation.
The benchmark measures performance of "for-style" traversal of LinkedList, ArrayList and array as well as "iterator-style" traversal of LinkedList and ArrayList e.g:
Iterator iter = list.iterator(); while (iter.hasNext()) { iter.next(); }
The code is available here. The output from JMH is
Benchmark Mode Cnt Score Error Units ListsBenchmark.forArray thrpt 10 423016.591 ± 2745.121 ops/s ListsBenchmark.forArrayList thrpt 10 345255.508 ± 1367.974 ops/s ListsBenchmark.forLinkedList thrpt 10 2559.053 ± 14.089 ops/s ListsBenchmark.iteratorArrayList thrpt 10 299156.440 ± 8655.560 ops/s ListsBenchmark.iteratorLinkedList thrpt 10 212684.595 ± 1036.017 ops/sThe score is number of operations per second (so higher is better) where an operation is a single traversal of a list/array of size 1000
Comments
- As expected, traversal of a LinkedList with a for loop performs badly - about n * n/2 on the average
- Although ArrayList is backed by an array, it performs significantly worse then a plain array.
- Iterating over LinkedList is significantly worse then iterating over ArrayList. This is most likely due to the fact that to move from one node to the next in a LinkedList, the next node if fetched from the heap and then the contained value is fetched from the heap again. In the case of the ArrayList the references to the values are loaded into CPU caches and so only one visit to the heap is needed
Tuesday, 16 June 2015
Lazy Map initialization
Here is one Java idiom I am happy to see go away with the tools made available in Java 8. In this scenario we have a Map with mutable values for which not all keys are known at creation time.
Before Java 8, the way to add a mapping would look like this:
private final Map< String, List< String > > map = new HashMap<>(); public void addValue(String key, String value) { ListThe equivalent logic with Java 8 is:valueList = map.get(key); if (valueList == null) { valueList = new ArrayList<>(); map.put(key, valueList); } valueList.add(value); }
public void addValue(String key, String value) { map.computeIfAbsent(key, v -> new ArrayList<>()).add(value); }If the Map values are immutable, the following also works:
private final Map< String, String > map = new HashMap<>(); public void addValue(String key, String value) { map.merge(key, value, (v1, v2) -> value); }
Thursday, 11 June 2015
Generating collisions in String.hashCode
In a previous post I mentioned collisions in String.hashCode(). Turns out that generating such collisions is quite easy.
String.hashCode() is defined as:
this can be written in a closed form as: 31^(N-1)a0 + 31^(N-2)a1 + ... + 31aN-2 + aN-1.
If the modified characters are are at index K and K+1 we have: hashCode(A)-hashCode(B) = ... = 31(aK - bK)+(aK+1 - bK+1).
We want the last expression to be zero, that is: 31(aK - bK) = bK+1 - aK+1. As we are free to choose bK and bK+1, to make things simple (and reduce the chance of overflowing/underflowing the two byte value of char) we choose bK = aK - 1 which gives us bK+1 = aK+1 + 31.
String.hashCode() is defined as:
public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; }For a string A of length N (a0, a1,..., aN-2, aN-1),
this can be written in a closed form as: 31^(N-1)a0 + 31^(N-2)a1 + ... + 31aN-2 + aN-1.
Pad by zeros
The first observation is that 0 is a valid character and prepending arbitrary number of 0 characters doesn't change the value of the hash. The following code takes advantage of this:public static String padByZeroes(String original, int numberOfExtraZeroes) { int length = original.length(); char[] chars = new char[length + numberOfExtraZeroes]; for (int i = 0; i < length; i++) { chars[i + numberOfExtraZeroes] = original.charAt(i); } return new String(chars); }padByZeroes produces strings of length greater then the original with identical hashcode.
Pair Perturbation
As a starting point to this approach we take a string B which is identical to A: (b0, b1,..., bN-2, bN-1). We then modify two consecutive character of B without affecting it's hashcode.If the modified characters are are at index K and K+1 we have: hashCode(A)-hashCode(B) = ... = 31(aK - bK)+(aK+1 - bK+1).
We want the last expression to be zero, that is: 31(aK - bK) = bK+1 - aK+1. As we are free to choose bK and bK+1, to make things simple (and reduce the chance of overflowing/underflowing the two byte value of char) we choose bK = aK - 1 which gives us bK+1 = aK+1 + 31.
public static String pairPerturbation(String original, int pairIndex) { int length = original.length(); if (pairIndex > length - 2) { throw new IllegalArgumentException("pairIndex can't be greater then " + (length - 2)); } char[] chars = original.toCharArray(); chars[pairIndex] = (char) (original.charAt(pairIndex) - 1); chars[pairIndex + 1] = (char) (original.charAt(pairIndex + 1) + 31); return new String(chars); }Here is a link to the Gist of the code and tests: https://gist.github.com/sorokod/a6848e6cd4c9fe32a523#file-stringhashcollisions
Subscribe to:
Posts (Atom)