The pure and simple truth about BINARY SEARCH
Tuesday, October 10, 2006
The pure and simple truth is rarely pure and never simple.
Oscar Wilde
Last week I got an email from a worried user that some information was missing on a sales report. After few hours of exhaustive debugging with a few time-outs in between, I realized that an obscure READ TABLE... command comes back with SY-SUBRC = 4. It was looking for a combination of material number and customer number in an internal table. Both numbers were right, leading zeroes and all. I had the whole table in front of me in the debugging window and the record, which READ was supposed to find, was indeed present in it. “What do you mean sy-subrc is 4?! Here is that record, right there, you dumbass!”, - almost yelled I at the poor innocent Dell monitor.
Here I should probably mention that READ TABLE command had BINARY SEARCH addition. As I’ve learned from my very long programming (not ABAP) experience, sometimes if you just make things simpler it might actually solve the problem. So I’ve just commented out the BINARY SEARCH part and ran the program again. Now it worked like a charm. OK, now I had to get to the bottom of this.
I set up a very simple test program:
DATA: BEGIN OF i_test OCCURS 0,(I’ve omitted populate_table routine because it just populates i_test with some test data). Here is the output from the program:
key1,
key2,
non_key,
END OF i_test.
PERFORM populate_table.
SORT i_test BY key1.
LOOP AT i_test.
WRITE: / i_test-key1, i_test-key2, i_test-non_key.
ENDLOOP.
READ TABLE i_test WITH KEY key1 = 'B' BINARY SEARCH.
IF sy-subrc = 0.
WRITE: / 'Found:'.
WRITE: / i_test-key1, i_test-key2, i_test-non_key.
ELSE.
WRITE: / 'Not found'.
ENDIF.
OK, so it found the very first record with B. So far so good. I changed the READ line to add the second key:
READ TABLE i_test WITH KEY key1 = 'B' key2 = 'Z' BINARY SEARCH.It did find B Z record. Alright, let’s make the things a bit more interesting:
READ TABLE i_test WITH KEY key2 = 'Z' key1 = 'B' BINARY SEARCH.Here comes the ‘Not found’ message! However, after I changed SORT from key1 to key2 it was able to find the B Z record again. Now let’s kick it up a notch. I changed SORT and READ commands as follows:
SORT i_test BY key1 ASCENDING key2 DESCENDING.Here is the content of the i_test table after the sorting so that you guys could follow:
...
READ TABLE i_test WITH KEY key1 = 'B' key2 = 'A' BINARY SEARCH
A Z CThe results were as follows: the test above (B A) came back with ‘Not found’ (this time switching key1 and key2 in READ did not help). The things got even curioser when the A A record was found but Z A was not.
A Z D
A A A
A A B
B Z E
B B C
B A A
B A B
C C A
Z B B
Z A A
So what’s the deal with this damn binary search? I really like the simple explanation that one guy gave in an SDN post: ”Let’s say you have numbers 1..to ..100 in a table and you are searching for 34. It would read the 50th record and if it is say 50 next it would read the 25th record and if it is say 25 it would carry to read the 38th record and so on.”
Back to my example. There were 11 entries in my test table. The binary search started by splitting the table in half and it got the middle record (B B C). “OK,” thought the computer. “Since I’m looking for Z and A, let me look at the second part of the list (because Z > B). Oh, now I see C C A, we are getting closer! Let’s look at what’s left after that.” Naturally, at this point the only records to search were only C C A, Z B B and Z A A. So it split the list in half again and got Z B B. “OMG, I went too far! Let me get back real quick. Hmm... I see C C A. C is less than Z, which means that there is no record with Z and A. Oh well... SY-SUBRC = 4. Buhbye!”.
As I finally found out, the problem with the sales report was that the internal table was first sorted by one field, which would have worked fine with the READ, but then re-sorted by another field somewhere in the middle. It looks like a good idea to sort the table right before the binary search, which I will do in the future.
Obviously, with BINARY SEARCH what you see is not always what you get. To get the right result, the table must be sorted by the right field and in ascending order. If this is not done properly, sometimes binary search might still work correctly, depending on what data is inside the table. But sometimes you might wish it didn’t work at all because it could make finding an error a major pain in the back.
While I was on it, I also ran the runtime analysis a few times. With the small amount of data in my test program ordinary READ actually worked even faster than READ ... BINARY SEARCH. However, with thousands of records and about 10 fields (as in my sales report), BINARY SEARCH performs much better. I’m pretty sure that hashed table would be even more efficient (unfortunately, it can not be used in that specific report).
posted by Your Friendly ABAPer @ 21:27, Direct link to this post
12 Comments:
- At 27/10/06 04:10, said...
-
Двоичный поиск.
Двоичный поиск - алгоритм поиска объекта по заданному признаку во множестве объектов, упорядоченных по тому же самому признаку.
Двоичный поиск заключается в том, что на каждом шаге множество объектов делится на две равные части и в работе остается та часть множества, где находится искомый объект. Процесс прекращается, когда в работе остается множество, состоящее из одного объекта.
уже в самом опрделении заложено, что при таком поиске таблица, должна быть отсортирована, действительно pure and simple) - At 27/10/06 20:52, said...
-
Thanks for an insightful comment!
For those of us who don't read Russian, that was a definition of 'binary search'. I've also found a definition in English in Wikipedia: http://en.wikipedia.org/wiki/Binary_search - At 28/10/06 13:40, said...
-
I've forgot to mention that u've got a really nice blog =)
- At 2/11/06 21:40, said...
-
Thanks, it's very kind of you.
- At 11/6/08 15:50, said...
-
“What do you mean sy-subrc is 4?! Here is that record, right there, you dumbass!”, - almost yelled I at the poor innocent Dell monitor.
--> haha! this is so funny because i actually felt like doing the same thing when this happened to me. interestingly enough, i'm using a Dell monitor too.
I love your blog! Educational and so fun to read!
Keep posting please. :) - At 11/6/08 21:12, said...
-
Thanks, chae! Things like that happen to all of us sometime. :)
- At 28/3/11 04:48, Achilles said...
-
Guys,
I want some clarification on why SORT DESCENDING should not be used with READ TABLE?
Thanks in advance,
Raghu. - At 28/3/11 22:54, said...
-
It's my understanding that binary search assumes that sort direction was default (= ascending). Therefore it simply won't function properly if the descending order was used.
- At 27/5/11 04:33, jup31 said...
-
This comment has been removed by the author.
- At 22/8/13 23:50, said...
-
OMG!!! I finally understood how the BINARY SEARCH works :D... Many, many, thanks ^_^!
- At 15/4/14 08:00, said...
-
This is really funny, but I am glad before I go deeper, I read this article first.. :)
- At 29/8/16 03:29, Kiran said...
-
Very nice article.
Thanks a lot my friend.