## Caching in python with a descriptor and a decorator

The Mercurial SCM has several strategies to cache expensive computations; you can find an overview of all of them at this page. This post is about the propertycache decorator: a mechanism to replace a function with its result the very first time it is called. The trick is made possible by the combination of two features of the python programming language: descriptors and decorators.

# Python descriptors

The python descriptor protocol is something that works as follows: if you override the methods __get__(), __set__() or __delete__() in a (new style) python class, you’re given the chance to execute your own code when an object of that class is evaluated (__get__()) or is bound to another object (__set__()). Which is, you can decide what happens when that evaluation (or binding) occurs.

One more thing to remember is that an object that overrides __get__() and __set__() behaves like a descriptor (which is: these methods are actually called instead of the default evaluation/binding behaviour) only if it is a member of another class.

Here a few references on descriptors:

## A dummy example

Here we use the “descriptor protocol” to preempt user code from assigning to an attribute of an object. That attribute is kept to a constant value instead:

class joke(object):
def __get__(self, obj, objtype):
return 'Bazinga!'
def __set__(self, obj, value):
pass

class foo(object):
x = joke()


We had to implement __set__() too, despite the fact that it’s empty: python would have ragarded joke as a non-data descriptor, which is a slightly different beast. Here is what happens when writing and reading the attribute x of an object of type foo:

>>> f = foo()
>>> f.x
'Bazinga!'
>>> f.x = 5
>>> f.x
'Bazinga!'

# Python decorators

Are cool. In python any callable that wraps a function is called “decorator”. Example: you have a function that does stuff, like

def deco(arg):
# stuff

and another function, that does some other stuff:

def fun(arg):
# other stuff

Sometimes is interesting to rebind fun like

fun = deco(fun)

and deco is said to be a “decorator” for fun. Some basic references:

Python has a convenient syntactic sugar for the rebinding above: it can be written as

@deco
def fun(arg):
# other stuff

I often use decorators when I need to enhance the behaviour of a function by having something done before the function call, or after. So I write the following closure and use it as a decorator:

def deco(fun):
def helper(arg)
# do something before
result = fun(arg)
# do something after
return result
return helper


and then, for whatever fun this makes sense, I rebind its name to

fun = deco(fun)

## The propertycache decorator

Without further ado, here the code, taken verbatim from the mercurial codebase:

class propertycache(object):
def __init__(self, func):
self.func = func
self.name = func.__name__
def __get__(self, obj, type=None):
result = self.func(obj)
self.cachevalue(obj, result)
return result

def cachevalue(self, obj, value):
setattr(obj, self.name, value)


and the example usage, taken straight from the mercurial wiki:

from util import propertycache

class foo(object):
@propertycache
def expensive(self):
print 'created'
return [1, 2, 3]

f = foo()
f.expensive # prints created
f.expensive # simply returns [1, 2, 3]

# to forget the saved value and calculate it again, use delattr
delattr(f, 'expensive')


## How does it work

Before anything happens, python defines

f.expensive = propertycache(f.expensive)

Note that this is a descriptor object, since the class propertycache overrides the __get__() method. From this point on, you can’t call the method f.expensive() anymore:

>>> f.expensive()
<__main__.foo object at 0xb742d28c>
created
Traceback (most recent call last):
File "", line 1, in
TypeError: 'list' object is not callable

What you do, instead, is to refer to f.expensive as a data member (which is, no () parens). The first time that f.expensive is evalutated, the original method f.expensive() is called (remember: it was saved in the func member of the object of type propertycache). Its result is then rebound to f.expensive by the cachevalue() method.
Now f.expensive is not a descriptor object anymore, but only the returned value of the original method: further evaluations won’t go thru the call of the original method anymore, just return the cached value.

As a final remark, note that this doesn’t generalize to methods that have more arguments than the self reference, since the __get__() method has a well defined signature and you can’t add more items in there: only a reference to the “owner” object is allowed (the instance of foo in our example).

## How was it done before?

Descriptors were introduced in the Mercurial codebase with revision 595ba2537d4f by Dirkjan Ochtman, the same person who wrote the excellent chapter on the AOSA book about Mercurial.

How was caching done before? It looked like this:

class foo:
def __getattr__(self, name):
if name == 'expensive1':
print 'Computing expensive1...'
self.expensive1 = [1,2,3]
return self.expensive1
elif name == 'expensive2':
print 'Computing expensive2...'
self.expensive2 = [4,5,6]
return self.expensive2
else:
raise AttributeError(name)


Let’s see that in action:

>>> f = foo()
>>> f.expensive1
Computing expensive1...
[1, 2, 3]
>>> f.expensive2
Computing expensive2...
[4, 5, 6]
>>> f.expensive1
[1, 2, 3]

Here the concertmaster is the __getattr__() method, which, if implemented, intercepts all accesses to members not found otherwise ; in our example, those are expensive1 and expensive2. Once accessed the first time, their value is computed and bound to a datamember; subsequent calls will find them without passing thru __getattr__.

It goes without saying that the decorator described above is much more terse than this boilerplate code; it factors away all the common behaviour.

Posted in Uncategorized | 1 Comment

## Mercurial histedit behind the scenes

What’s the magic behind the history rewriting extension in Mercurial? It’s called histedit, it’s written by Augie Fackler, and it’s a fundamental tool in my daily coding activity. I fired up pdb and watched closely to what it does. Lean back and enjoy (all good ideas are Augie’s, all mistakes are mine!).

First off, I am obliged to rise a giant “don’t try this at home” sign upfront. I will use internal Mercurial APIs, which are deliberately undocumented; the reason is very simple: they might change next week without further notice.

Having said that, let me ask you a question: how would you create a changeset that is equivalent to a sequence of N other changesets? Which is, in mercurial lingo, how would you implement folding N changesets? You’d probably smash the first two, then add the third one to that result, and after you’d add the fourth, the fifth and so on. That’s what happens in histedit. Let’s see that in action.

Before we tackle the adventurous task of editing the history of our repository, we have to learn how to do two operations:

1. pick a changeset somewhere in the history and apply it to the working directory
2. collapse a chain of commits into a single one

# Pick there, apply here

The histedit extension special-cases two situations.

## In place

consider the repo created with

$hg init$ touch foo ; hg add foo ; hg commit -m foo
$touch bar ; hg add bar ; hg commit -m bar$ hg up 0

which then looks like

what we want to do here is to extract changeset 1 and apply it to the working directory. Since the parents of cset 1 and of the working directory are the same (they’re both children of changeset 0), the situation can be handled by a simple shot of cmdutil.revert(). Let’s switch to the python shell:

from mercurial import ui, cmdutil, node, localrepo

repo = localrepo.localrepository(ui.ui(), '.')
cmdutil.revert(ui.ui(), repo, repo[1],
(repo[0].node(), node.nullid), all=True)
repo.commit(text="Applied cset 1")


this, as promised, leads us to

The revert(ui, repo, ctx, parents, *pats, **opts) function restore the files to the state they had at revision ctx, and set the parents of the working directory. You might think there is something odd going on here, and I did too: this is a quite unusual application of revert, which is more often use to restore an ancestor revision. Here we’re picking stuff from another branch (let’s say, from “the future”).

## Far away nodes: merge needed

The other situation is when the parents of the working directory and of our target revision differ but… those two (different) parents have the same content. It might seem an artificial situation, but it is what happen in a generic step of the histedit algorithm. In that case we resort to a merge. Let’s first build the sample repo:

$touch foo ; hg add foo ; hg commit -m foo$ touch bar ; hg add bar ; hg commit -m bar
$touch baz ; hg add baz ; hg commit -m baz$ hg up 0

then again the revert() trick to apply changeset 1 on the the of 0, but on a parallel line; the code is exactly the python snippet above, so we have

what’s the histedit recipe to apply 2 on the top of 3? We merge the working directory (updated to3) with 2, setting 1 as the common ancestor; this isn’t the topological truth (1 is not an ancestor of 3) but is morally ok, since 1 and 3 are contain the same state.

repo = localrepo.localrepository(ui.ui(), '.')
merge.update(repo, repo[2].node(), True, True, False, repo[1].node())
repo.setparents(repo[3].node(), node.nullid)
repo.dirstate.write()
repo.commit(text="Applied cset 2")


here the result:

Note at how I carefully cleanup the parents of the changeset, removing traces of the merge that just happened.

All of this is nicely packaged in the histedit.applychanges() function in the histedit extension.

# Collapse many into one

$hg init$ touch foo ; hg add foo ; hg commit -m foo
$touch bar ; hg add bar ; hg commit -m bar$ touch baz ; hg add baz ; hg commit -m baz
$touch woo ; hg add woo ; hg commit -m woo We want to take changeset from 1 to 3 and smash them into one that sums them all. Here is the python code: from mercurial import ui, cmdutil, node, localrepo, context repo = localrepo.localrepository(ui.ui(), '.') files = set(repo[1].files()) | set(repo[2].files()) | set(repo[3].files()) mf = repo[3].manifest() def filectxfn(repo, ctx, path): if path in mf: fctx = repo[3][path] mctx = context.memfilectx(fctx.path(), fctx.data()) return mctx raise IOError() parents = (repo[0].node(), node.nullid) new = context.memctx(repo, parents=parents, text='collapsed 1 to 3', files=files, filectxfn=filectxfn, user='gghh') repo.commitctx(new)  Again, there is a function for this in the histedit extension, histedit.collapse(). I was suprised looking at that code: I have a mental model of commits as, say, an ordered sequence of diffs; that’s how I read this picture anyway. If I am collapsing changeset, I expect some sort of iteration over that chain. But that’s not what collapse() does. Look at the closure filectxfn: basically it consider only filelogs from the last changeset, skipping those filelogs that didn’t changed since first. As if in last you have all needed things necessary to know what happened in between. Which is exactly how mercurial works: filelogs just internally make the diff sequence, externally only the full contents are exposed. So basically all you can do is ask a filelog for the contents of something plus the parents at a version. # When does this show start? It’s getting late With all this machinery in place, we’re now ready to juggle with those little balls so fast that tortoiseHg will have an hard time catching up. First off, the toy repo. For the occasion we’re giving foo, bar and baz a little break and use some piece of real literature. $ hg init
$echo 'Dorothy lived in the midst of the great\ > Kansas prairies, with Uncle' > chap1.txt$ hg add chap1.txt ; hg commit -m "line 1"

$echo "Henry, who was a farmer, and Aunt Em,\ > who was the farmer's wife. Their" >> chap1.txt$ hg commit -m "line 2"

$echo 'house was small, for the lumber to build it\ > had to be carried by wagon' >> chap1.txt$ hg commit -m "line 3"

$echo 'many miles. There were four walls, a floor\ > and a roof, which made one' >> chap1.txt$ hg commit -m "line 4"

$echo 'room; and this room contained a rusty looking\ > cookstove, a cupboard for' >> chap1.txt$ hg commit -m "line 5"

$echo 'the dishes, a table, three or four chairs,\ > and the beds.' >> chap1.txt$ hg commit -m "line 6"

Our plan is to fold all changesets between 1 and 5 included into a single one. Which is, we would run

hg histedit 1

The histedit commands will be:

(pick, 1)
(fold, 2)
(fold, 3)
(fold, 4)
(fold, 5)

The first thing we do is to update the working directory to 0 and play the cmdutil.revert() trick to apply 2 on the top of 1.

from mercurial import ui, hg, commands, localrepo, node, merge
from mercurial import cmdutil, scmutil, context, hg, repair
from hgext import histedit

repo = localrepo.localrepository(ui.ui(), '.')

hg.update(repo, repo[1].node())

cmdutil.revert(ui.ui(), repo, repo[2],
(repo[1].node(), node.nullid), all=True)
repo.commit(text="2 bis")


Now we again update to 0, and collapse the line 1 -- 6 into changeset 7, which we apply on the top of 0.

hg.update(repo, repo[0].node())
histedit.collapse(repo, repo[1], repo[6], {})
hg.update(repo, repo[7].node())


Now we want to apply 3 on the top of 7. That’s the merge case we saw earlier; our reference ancestor will be 2, since 0 -- 7 is the same as 0 -- 1 -- 2.

merge.update(repo, repo[3].node(), True, True, False, repo[2].node())
repo.setparents(repo[7].node(), node.nullid)
repo.dirstate.write()
repo.commit(text="7 and 3")


Now guess what? collapse the line 7 -- 8, call it 9.

hg.update(repo, repo[0].node())
histedit.collapse(repo, repo[7], repo[8], {})
hg.update(repo, repo[9].node())


Come on, a couple more and we’re done: 4 on top of 9, then 5 on top of 11.

merge.update(repo, repo[4].node(), True, True, False, repo[3].node())
repo.setparents(repo[9].node(), node.nullid)
repo.dirstate.write()
repo.commit(text="4 bis")

hg.update(repo, repo[0].node())
histedit.collapse(repo, repo[9], repo[10], {})
hg.update(repo, repo[11].node())

merge.update(repo, repo[5].node(), True, True, False, repo[4].node())
repo.setparents(repo[11].node(), node.nullid)
repo.dirstate.write()
repo.commit(text="5 bis")


now the final collapse: changeset 13 is the result of 1 -- 2 -- 3 -- 4 -- 5 all folded together.

hg.update(repo, repo[0].node())
histedit.collapse(repo, repo[11], repo[12], {})
hg.update(repo, repo[13].node())


Now it’s time to throw away the leftovers. repair.strip() comes to the rescue:

repair.strip(ui.ui(), repo, repo[1].node())


repair.strip(ui.ui(), repo, repo[1].node())


repair.strip(ui.ui(), repo, repo[1].node())


repair.strip(ui.ui(), repo, repo[1].node())


Hooray!

## Is TCP turing complete?

Is the TCP protocol capable of… computing things? In other words: if you consider the topology of your network (nodes and connections between them) and the data that each node plans to send across the network as “the input” of TCP as a “computing machine”, and the data exchange (or maybe, just the rate of the data exchange at steady state) that actually happens to be “the output”, can one imagine some weird complicated “encoding scheme” so that, at the end of the day, one can make TCP compute prime numbers?

In much more cryptic terms, for people used to theoretical computer science lingo, is the TCP protocol somehow Turing Complete?

I asked myself this question since I learnt of the AIMD congestion control scheme, where basically the rate at which data is transmitted increases linearly in time until a packet loss is detected. At that point, the transmission rate is divided by a constant factor. Well, I feel that this scheme is complex enough that there might be some computation power lurking in there.

Personally I find it very amusing when turing completeness sneaks into systems “by accident”, and this happens quite often in real world situations. Some examples:

Usually when you have turing completeness by accident it’s quite an undesirable thing, as famous programmer Jamie Zawinsky points out in his blog here; interesting also the comment on the DNS resource record NAPTR.

Any comment is *very* wellcome! I feel the formulation of my question is quite fuzzy, so even if somebody can make me better understand *what* I am asking… well that would make my day

Posted in Uncategorized | 1 Comment

## A feature of (q)bzr I really love

Bazaar lets you compare branches. Not just branches from the same repository, i.e. all stored in the same .bzr/ folder, but arbitrary branches that may (or may not) have some common history. All those years of typing code for fun and profit, and staring at directed acyclic graphs wandering where the heck that stupid C macro that breaks the build was introduced, might have changed my sense of “awesome”.
But ladies and gentlemen, this is pretty much awesome.

Let me show you an example.

## Playing the TIAS game with bazaar

TIAS is Try It And See, no less no more. I did some TIAS in another post about Mercurial, and I am doing it again. Let’s get this party started.

First thing we create a repository and fork it two times. Actually in bzr the correspondence between repositories and branches can be one to one, with bzr init, or one to many, with bzr init-repo. The latter creates what is called a “shared repository”, i.e. a repository that can hosts many branches. This is indeed the way git and mercurial work; the former option is a bzr-only thing. Let’s stick to the one-repo-one-branch setup; it doesn’t make a big difference for the purpose of this post, and it requires less typing.

$bzr init b Created a standalone tree (format: 2a)$ bzr branch b c1
Branched 0 revisions.
$bzr branch b c2 Branched 0 revisions. We now switch to c1, enter a couple of commits, and push them back to the parent branch, which is b/. $ cd c1

/c1$echo 'hello world!' > src1.txt /c1$ bzr add src1.txt
/c1$bzr commit -m "initial import" Committing to: c1/ added src1.txt Committed revision 1. /c1$ echo 'hello world!\nIt is gonna be a nice day.' > src1.txt
/c1$bzr commit -m "added optimism." Committing to: c1/ modified src1.txt Committed revision 2. /c1$ bzr push :parent
All changes applied successfully.
Pushed up to revision 2.

Some time later, developer c2 enters the scene, retrieves the current content of b/ and makes a commit. He doesn’t push back to b/, because he’s not quite done yet.

$cd ../c2/ /c2$ bzr pull
Using saved parent location: b/
+N  src1.txt
All changes applied successfully.
Now on revision 2.

/c2$echo "That's a nice Tnettenba" > src2.txt /c2$ bzr add src2.txt
/c2$bzr commit -m "added joke" Committing to: c2/ added src2.txt Committed revision 3. c1 gets back to work: he continues to develop his thing. /c2$ cd ../c1

/c1$echo 'voodoo atchoo to you too' > src3.txt /c1$ bzr add src3.txt
/c1$bzr commit -m 'thank you!' Committing to: c1/ added src3.txt Committed revision 3. The release engineer, who has direct access to the server where b/ is stored, has been thinking on how to fix that shell script that handles the deployment. He finally figures it out; he’s in a hurry, and doesn’t want to branch and merge — he commits directly on b/. It happens. $ cd ../b/

/b$echo 'the grumpy cat says it had fun, once' > src4.txt /b$ bzr add src4.txt
/b$bzr commit -m 'that never happened.' Committing to: b/ added src4.txt Committed revision 3. It’s c1‘s turn to play now. He had a coffee with c2; they talked about the code and c1 realizes he needs the new feature that c2 just committed. So he merges with c2, and then goes a little further. /c1$ bzr merge ../c2
+N  src2.txt
All changes applied successfully.
/c1$bzr commit -m "thanks c2, that was neat" Committing to: c1/ added src2.txt Committed revision 4. /c1$ echo 'if (0 == a);' > src6.txt
/c1$bzr add src6.txt adding src6.txt /c1$ bzr commit -m 'Zero the value is not. Yoda style.'
Committing to: c1/
Committed revision 5.

It’s almost time to leave, and c2 has just the time to fix that little annoying bug before heading home.

/c1$cd ../c2 /c2$ echo 'NaNNaNaNNaNaN' > src5.txt
/c2$bzr add src5.txt adding src5.txt /c2$ bzr commit -m 'Batman. I need Batman.'
Committing to: c2/
Committed revision 4.

## The sweet pretty pictures

The day after c1 and c2 get back to work, chit-chat a little, and realize that they don’t know exactly how their two branches relate. Ok, somebody pushed, somebody merged… but they’d like to see it crystal clear, to anticipate all the problems that will arise when, next wednesday, all lines are going to be merged into the production code. More over, they’ve no idea that the release engineer jumped in and added stuff directly to the reference repo b/. What do they do?
Easy peasy: bzr qlog b c1 c2.
The amazing command

bzr qlog <branch_1> <branch_2> ... <branch_n>

shows what all branches have in common, who diverged, who pushed to whom, everything. And recall that we’re talking of branches that reside in different repositories. Too good! This is what the command

bzr qlog b c1 c2

would show:

Isn’t that pretty?
Just for the sake of taking some screenshots and messing around with gimp, I’ll show you that the branches you compare can be totally, and I say totally, unrelated. Here is what you get if you compare the trunk of bzr itself with the latest mysql-server codebase via

bzr qlog lp:bzr lp:mysql-server

I had to shrink it a little bit, as you can imagine; of course the two don’t share a thing, but that’s just for the fun of the abuse:

## All your rebase are belong to us

When I hit “commit” on my version control system, I behave very much like when I click “save” on my text editor. I just do it whenever I feel like. I can’t see in advance how the thing is gonna look like at the end, in all details. Result: at the end of the day, I have a bazillion of small commits without any clear, organic structure. I just cannot hand this blob to the release engineer, or code reviewer or whatever — the guy is gonna yell at me all over the building. So, guess what? you have to rewrite your history, my son. And here is how it works (with Mercurial).

## Beautify the commit history, even between regular merges

Alice clones Production; commits some stuff; then merges from Production just to keep up to date; commits again a few times, re-merges, etc. Then she would like to beautify a little her commit history, and would like to “squash” all commits she made between a merge and the next one, all in a single commit. Here a picture that explains the scenario:

I apologize for the ugly picture (you might want to read again the title of this blog). At the expense of killing the narrative, I am gonna say upfront how the story is gonna end (Spoiler Alert!): initially Alice would be happy just by folding commits between merge M1 and merge M2. But then she discovers that actually she’s free to move her commits all the way up and down in her development line, and that the two merges with upstream that she did are … not a “boundary” anymore. So, in the final picture, all merges are still there (even if M1 is tecnically just a clone), and commits A, B, C, D, E can be in arbitrary positions with respect to M1 and M2 (just need to respect their relative order); look at the picture at the end of the post for some abstract art representing the final result.
The hg terminology for squashing commits is folding. We are going to see how to handle this situation using the rebase and histedit mercurial extensions. Things we’re gonna need, and reference documents:

Ok, last step is optional but appreciated.

## First off, Mercurial TIAS (Try It And See)

TIAS is an effective tecnique to effectively grasp new concepts, and the first step on the way to mastery. For mercurial, it goes like this: let’s create a repo serving as a (local) reference one

$hg init b this will create an empty repository in folder b/. Other clones would communicate via this one as if it were remote repos like in a real world (distributed) development workflow. You will then clone one or more times repo b/, simulating actions from several developers: $ hg clone b c1
$hg clone b c2 ... Please note that all local repos are basically equal, and here we just talking about conventions. There is no such a thing like a client/server scenario in the distributed version control world, but rather a peer-to-peer setup. You now have a basic working environment where you can play, experiment, try it, and see. For instance, why not enter c1, add a small text file and commit it: $ cd c1/ ; echo "hello world" > src.txt ;
c1/$hg add src.txt ; hg commit -m "initial import" now we could push this commit to c1‘s parent repository, wich is b, with c1/$ hg push

then enter c2/ and retrieve those modifications via

c1/$cd ../c2 c2/$ hg pull
c2/$hg update Did it work? Let’s see: c2/$ cd ..
$tree . ├── b ├── c1 │ └── src.txt └── c2 └── src.txt Hooray! Note that b‘s working tree is empty, because we never run hg update there; nonetheless, all the stuff we care about is safely stored into b/.hg/. ## Simulation of the workflow I’ll now reproduce the situation described above. Let’s create a dumb production repo: $ hg init b
$cd b /b$ echo -e 'Hello\nworld!!' > src1.txt
/b$cat src1.txt Hello world!! /b$ hg add src1.txt; hg commit -m "initial import"
/b$cd .. Note that I committed in b/, which actually is not what you do in real life; the reference repository only receives push and pull from other repo. But this little idiosyncrasy is inessential here, it doesn’t alter the point I want to make. Let’s then clone b $ hg clone b c1
updating to branch default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$tree . ├── b │ └── src1.txt └── c1 └── src1.txt 2 directories, 2 files back on prod to do a commit $ cd b/
/b$echo -e 'Hello\nworld' > src1.txt /b$ hg commit -m "removed bangs"
/b$hg log --graph @ changeset: 1:38bceaef9ec1 | tag: tip | user: gghh | date: Thu Apr 11 10:03:52 2013 +0200 | summary: removed bangs | o changeset: 0:1e0b6ab683de user: gghh date: Thu Apr 11 10:03:06 2013 +0200 summary: initial import /b$ cd ..

now some commit on repo c1

$cd c1/ /c1$ echo 'foo bar' > src2.txt
/c1$hg add src2.txt; hg commit -m "added a source file" another commit /c1$ echo -e 'foo\nbar' > src2.txt
/c1$cat src2.txt foo bar /c1$ hg commit -m "inserted newline"

a third commit

/c1$echo -e 'foo\nbar\nbaz\nbazinga' > src2.txt /c1$ hg commit -m "more text"

checking history

/c1$hg log --graph @ changeset: 3:5df72ef842d2 | tag: tip | user: gghh | date: Thu Apr 11 10:07:12 2013 +0200 | summary: more text | o changeset: 2:26a980e728e2 | user: gghh | date: Thu Apr 11 10:07:00 2013 +0200 | summary: inserted newline | o changeset: 1:9f7e0b186fa4 | user: gghh | date: Thu Apr 11 10:06:43 2013 +0200 | summary: added a source file | o changeset: 0:1e0b6ab683de user: gghh date: Thu Apr 11 10:03:06 2013 +0200 summary: initial import now pull and merge from b (production) /c1$ hg pull
pulling from /b
searching for changes
/c1$hg merge 1 files updated, 0 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) /c1$ hg commit -m "merged with production"

checking history

/c1$hg log --graph @ changeset: 5:0877a5c95a08 |\ tag: tip | | parent: 3:5df72ef842d2 | | parent: 4:38bceaef9ec1 | | user: gghh | | date: Thu Apr 11 10:09:04 2013 +0200 | | summary: merged with production | | | o changeset: 4:38bceaef9ec1 | | parent: 0:1e0b6ab683de | | user: gghh | | date: Thu Apr 11 10:03:52 2013 +0200 | | summary: removed bangs | | o | changeset: 3:5df72ef842d2 | | user: gghh | | date: Thu Apr 11 10:07:12 2013 +0200 | | summary: more text | | o | changeset: 2:26a980e728e2 | | user: gghh | | date: Thu Apr 11 10:07:00 2013 +0200 | | summary: inserted newline | | o | changeset: 1:9f7e0b186fa4 |/ user: gghh | date: Thu Apr 11 10:06:43 2013 +0200 | summary: added a source file | o changeset: 0:1e0b6ab683de user: gghh date: Thu Apr 11 10:03:06 2013 +0200 summary: initial import checking phases /c1$ hg phase -r 0::5
0: public
1: draft
2: draft
3: draft
4: public
5: draft

## Wait, hold on: phases? whazat?

Mercurial phases are a powerful features that prevent you from doing mistakes when rewriting the history of a repo. A changeset (commit) marked as public is a changeset that has been shared (i.e., thru a push/pull), so it is not wise to modify it. Mercurial keeps track of that, and an attempt to rewrite a public changeset is prevented, unless you change the phase level of the commit with

hg phase -v --draft [rev] --force

note the option --force, since we’re about to take the risk of rising the commit level. Just to give a little context on how advanced this feature is, I’ll mention the fact the git is going to implement it in the next Google Summer of Code.

## Rewrite history!

Back to the original problem, note that commits 1, 2, 3 and 5 are draft, since they never left the repo c1, so it’s safe to move them around. We do that using the rebase mercurial extension, available since version 2.3.

### État des lieux

Before we start messing up with our repository, let’s take note of all changesets:

/c1$hg log -p changeset: 5:0877a5c95a08 tag: tip parent: 3:5df72ef842d2 parent: 4:38bceaef9ec1 user: gghh date: Thu Apr 11 10:09:04 2013 +0200 summary: merged with production diff -r 5df72ef842d2 -r 0877a5c95a08 src1.txt --- a/src1.txt Thu Apr 11 10:07:12 2013 +0200 +++ b/src1.txt Thu Apr 11 10:09:04 2013 +0200 @@ -1,2 +1,2 @@ Hello -world!! +world changeset: 4:38bceaef9ec1 parent: 0:1e0b6ab683de user: gghh date: Thu Apr 11 10:03:52 2013 +0200 summary: removed bangs diff -r 1e0b6ab683de -r 38bceaef9ec1 src1.txt --- a/src1.txt Thu Apr 11 10:03:06 2013 +0200 +++ b/src1.txt Thu Apr 11 10:03:52 2013 +0200 @@ -1,2 +1,2 @@ Hello -world!! +world changeset: 3:5df72ef842d2 user: gghh date: Thu Apr 11 10:07:12 2013 +0200 summary: more text diff -r 26a980e728e2 -r 5df72ef842d2 src2.txt --- a/src2.txt Thu Apr 11 10:07:00 2013 +0200 +++ b/src2.txt Thu Apr 11 10:07:12 2013 +0200 @@ -1,2 +1,4 @@ foo bar +baz +bazinga changeset: 2:26a980e728e2 user: gghh date: Thu Apr 11 10:07:00 2013 +0200 summary: inserted newline diff -r 9f7e0b186fa4 -r 26a980e728e2 src2.txt --- a/src2.txt Thu Apr 11 10:06:43 2013 +0200 +++ b/src2.txt Thu Apr 11 10:07:00 2013 +0200 @@ -1,1 +1,2 @@ -foo bar +foo +bar changeset: 1:9f7e0b186fa4 user: gghh date: Thu Apr 11 10:06:43 2013 +0200 summary: added a source file diff -r 1e0b6ab683de -r 9f7e0b186fa4 src2.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src2.txt Thu Apr 11 10:06:43 2013 +0200 @@ -0,0 +1,1 @@ +foo bar changeset: 0:1e0b6ab683de user: gghh date: Thu Apr 11 10:03:06 2013 +0200 summary: initial import diff -r 000000000000 -r 1e0b6ab683de src1.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src1.txt Thu Apr 11 10:03:06 2013 +0200 @@ -0,0 +1,2 @@ +Hello +world!! ### Rebase Let’s do it! just as in this example from the docs, /c1$ hg rebase --dest 4 --source 1
saved backup bundle to c1/.hg/strip-backup/9f7e0b186fa4-backup.hg

Note that, for the sake of readability, I am referring to changesets via their revision numbers; the other option is to use their hashes. This is perfectly working, just be aware that revision numbers aren’t unique across repositories, while hashes are.
What does the history looks like now?

/c1$hg log --graph @ changeset: 4:c61940d6fbeb | tag: tip | user: gghh | date: Thu Apr 11 10:07:12 2013 +0200 | summary: more text | o changeset: 3:45eebb742cee | user: gghh | date: Thu Apr 11 10:07:00 2013 +0200 | summary: inserted newline | o changeset: 2:9ad745b5c9db | user: gghh | date: Thu Apr 11 10:06:43 2013 +0200 | summary: added a source file | o changeset: 1:38bceaef9ec1 | user: gghh | date: Thu Apr 11 10:03:52 2013 +0200 | summary: removed bangs | o changeset: 0:1e0b6ab683de user: gghh date: Thu Apr 11 10:03:06 2013 +0200 summary: initial import and phases are /c1$ hg phase 0::4
0: public
1: public
2: draft
3: draft
4: draft

### Now collapse (fold)

To do the folding from revision 2 onwards, we finally resort to the “histedit” Mercurial extension:

/c1$hg histedit 2 change the commands from pick 9ad745b5c9db 2 added a source file pick 45eebb742cee 3 inserted newline pick c61940d6fbeb 4 more text # Edit history between 9ad745b5c9db and c61940d6fbeb # # Commands: # p, pick = use commit # e, edit = use commit, but stop for amending # f, fold = use commit, but fold into previous commit (combines N and N-1) # d, drop = remove commit from history # m, mess = edit message without changing commit content # to pick 9ad745b5c9db 2 added a source file fold 45eebb742cee 3 inserted newline fold c61940d6fbeb 4 more text add an updated commit message and it’s done. A last glance at our pretty history: /c1$ hg log --graph
|  tag:         tip
|  user:        gghh
|  date:        Thu Apr 11 10:07:12 2013 +0200
|  summary:     added a source file and other stuff
|
o  changeset:   1:38bceaef9ec1
|  user:        gghh
|  date:        Thu Apr 11 10:03:52 2013 +0200
|  summary:     removed bangs
|
o  changeset:   0:1e0b6ab683de
user:        gghh
date:        Thu Apr 11 10:03:06 2013 +0200
summary:     initial import

now the last revision is the sum of the three.

/c1$hg log -p -r 2 changeset: 2:5f8e8722ad9a tag: tip user: gghh date: Thu Apr 11 10:07:12 2013 +0200 summary: added a source file and other stuff diff -r 38bceaef9ec1 -r 5f8e8722ad9a src2.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src2.txt Thu Apr 11 10:07:12 2013 +0200 @@ -0,0 +1,4 @@ +foo +bar +baz +bazinga So that’s all I have got. Just a few other pointers for the men of good will: hg pull --rebase and hg rebase --dest [rev] --source [rev] --collapse can perform common task at once, as described here and ?here. ## A note about getting the latest Mercurial If you happen to work on a ubuntu box like me, you can bypass the beaurocracy of official ubuntu repositories and get the bleeding latest Mercurial with a simple sudo add-apt-repository ppa:mercurial-ppa/releases sudo apt-get update sudo apt-get install mercurial ## Recap To recap, we started with the goal of helping Alice reorganize her commits between one merge with upstream and the following, and we discovered that much more is possible: if the situation isn’t pathological, marges aren’t a bound at all, and “draft” commits from Alice can really be moved around. Mercurial has a really powerful mechanism to edit the history of a repo, and more awesomeness still has to come. More over, “phases” help to keep the thing safe. Posted in Uncategorized | Leave a comment ## chord diagram layout in d3.js: implementation In a previous post I wrote how much I like the d3 chord diagram? per se, and how little I like to configure it via the matrix API function?. I argued that a different interface can be more useful; in this post I describe the implementation details of this alternative layout function. To recap, for me a chord diagram is a collection of “polygons”. A “polygon” is a collection of “vertices”; a “vertex” is given by a pair of values, i.e. the ID of the arc the vertex belongs to and a “size”. The layout implementation will connect those vertices with chords in the natural way, i.e. with the shortest closed path of chords. There can be many of such chord paths, and we don’t really care which one is chosen. It’s inessential, as long as the path is closed and minimal. Here a sample javascript object that contains all necessary information to build such a diagram: connections = [ [ {group: 1, value: 1}, {group: 3, value: 1}, {group: 5, value: 1}, {group: 7, value: 1} ], [ {group: 0, value: 2} ], [ {group: 2, value: 2} ], [ {group: 4, value: 2} ], [ {group: 6, value: 2} ] ] Here the resulting diagram; as far as we’re concerned, both renderings represent faithfully the input, and we leave to the layout engine to “fill in the blanks” and choose at its will. The goal of the game is to massage our input data until we get a representation of it compatible with the interfaces of d3.svg.chord, i.e. groups are described by an array of objects like {endAngle: ..., index: ..., startAngle: ..., value: ...} where startAngle and endAngle are expressed in radians and indentify the arc, index is an integer ID corresponding to the group properties found in connections, value is the size of the group expressed in some scale (namely the sum of all connection‘s values where that group appears; chords shall be described by an array of object like {source: {endAngle: ..., index: ..., startAngle: ..., subindex: ..., value: ..., groups: {ID_1: '', ID_2: '', ...} } , target: {endAngle: ..., index: ..., startAngle: ..., subindex: ..., value: ..., groups: {ID_1: '', ID_2: '', ...} } } where the properties source and target identify the start and end point of a chord. The sub-properties startAngle, endAngle and value describe the width of the chord base in radians and in an absolute scale, index identify the group where the base should be placed. subindex tells the engine where to put the chord inside a group; if two chord’s endpoints have the same index and subindex, their bases will be superposed. The groups subproperty isn’t present in the current d3 implementation; it is a collection of group IDs: all the group touched by the polygon the chord belongs to. I use that datum for the fade-out effect on whole polygons. My implementation is a sequence of loops, where I pass over and over my initial data to change its representation. To describe what I do, I will here use the javascript REPL provided by nodejs (you thought that node was just a webserver… you were wrong!). So fire up your node REPL: $ node
>

From now on, if you have the node REPL on your machine, just cut and paste my code snippet to follow live what’s going on. Let’s first input the some connection data

var connections = [ [ {group: 0, value: 42} ],
[ {group: 0, value: 5},
{group: 1, value: 7},
{group: 2, value: 7},
{group: 3, value: 5} ],
[ {group: 0, value: 11},
{group: 2, value: 11} ] ];


First thing, we compute the size of each group

var ep, i, j, k, ngroups = 0, groupSums = {};

k = 0, i = -1; while (++i < connections.length) {
j = -1; while (++j < connections[i].length) {
ep = connections[i][j].group;
if (ep in groupSums) {
groupSums[ep] += connections[i][j].value;
} else {
groupSums[ep] = connections[i][j].value;
++ngroups;
}
k += connections[i][j].value;
}
}


we’re running over all connections element, and each time we find a group ID we update the corresponding entry in the groupSums dictionary. Let’s inspect it:

> ngroups
4
> groupSums
{ '0': 58,
'1': 7,
'2': 18,
'3': 5 }

As expected, ngroups is 4 and, for instance, the total size of group 0 is 42+5+11=58. Those valuen will then be rendered as portions of a circle. Like in a pie chart, but only on the perimeter. I now need the list of IDs for groups; since importing external modules in the REPL is a little tricky, I’ll not resort to d3.range but provide it by hand, just for the sake of this example

var range = function(n) {
var out = [], i = -1;
while(++i<n) {out.push(i);}
return out;
}

var groupIndex = range(ngroups);

> groupIndex
[ 0, 1, 2, 3 ]

as a side note, in javascript there is no builtin and portable way to get the number of properties of an object; that’s why I have to keep the ngroups counter running on the side. Now convert k from the total sum of groups to our reference angular unity, expressed in radians

var padding = 0.05;
k = (2 * 3.1415 - padding * ngroups) / k;

> k
0.069125

now build the polygons data structure. With the following snippet we loop over the input connections; we saw that a connection element correspond to a polygon, i.e. a closed path of chords. Since all we want is to join vertices in the natural way, there’s no need to specify the edges in input — but at a given moment we have to compute those edges; that moment is now. For every vertex v_i in a connection element, we add the edge (v_i-1, v_i) to the corresponding polygon.

var polygons = [];
var poly = {edges: [],
vertices: {}};

i = -1; while (++i < connections.length) {
if (connections[i].length > 1) {
j = 0; while (++j < connections[i].length) {
poly.edges.push({
source: connections[i][j-1],
target: connections[i][j]
});
poly.vertices[connections[i][j-1].group + ''] = '';
}
poly.vertices[connections[i][j-1].group + ''] = '';
if (poly.edges.length > 1) {
poly.edges.push({
source: connections[i][0],
target: connections[i][j-1]
});
}
polygons.push(poly);
poly = {edges: [],
vertices: {}};
}
};


Note that we set apart two special cases: single-vertex connections (they correspond to no polygon at all, we just need to remember to leave some free space in the interested group) and two-vertices polygons (they’re are composed of a single chord, no need to close the path at the end). Let’s us use the REPL with joy, and inspect the polygons object.

> polygons.length
2

As expected; one polygon is a “square”, and the other is a single segment.

> polygons[0]
{ edges:
[ { source: [Object], target: [Object] },
{ source: [Object], target: [Object] },
{ source: [Object], target: [Object] },
{ source: [Object], target: [Object] } ],
vertices:
{ '0': '',
'1': '',
'2': '',
'3': '' } }

the REPL’s pretty printer doesn’t show us more than three level of nesting, so we need to access those by ourselves. By the way, note the “vertices” property of a polygon; this will be useful when we’ll set the fade effect on whole polygons.

> polygons[0].edges
[ { source: { group: 0, value: 5 },
target: { group: 1, value: 7 } },
{ source: { group: 1, value: 7 },
target: { group: 2, value: 7 } },
{ source: { group: 2, value: 7 },
target: { group: 3, value: 5 } },
{ source: { group: 0, value: 5 },
target: { group: 3, value: 5 } } ]
> polygons[0].vertices
{ '0': '',
'1': '',
'2': '',
'3': '' }

polygons[i].edges.source and polygons[i].edges.target are reference to connections sub-elements.

> polygons[1].edges
[ { source: { group: 0, value: 11 },
target: { group: 2, value: 11 } } ]
> polygons[1].vertices
{ '0': '', '2': '' }

Everything as expected. We can move on to the next loop, where we fill up a new datastructure called subgroups; a “subgroup” is something that live inside a group (one the arcs of our piechart), and takes care of multiple ribbons sharing the same base. If two ribbons departs from the same group and, moreover, have that base in common, they belong to the same subgroup. Otherwise the SVG engine would have no way to know that they have to be placed together, and would draw them one next to the other.

Of course a chord is referenced in two distinct subgroups, one for each group it touches (begin and end).

var h, samebase, subgroups = [];

i = -1; while (++i < ngroups) {
subgroups[i] = [];
j = -1; while (++j < polygons.length) {
samebase = {'ribbons': [],
'basevalue': 0};
h = -1; while (++h < polygons[j].edges.length) {
if (polygons[j].edges[h].source.group === i) {
samebase.ribbons.push(polygons[j].edges[h]);
samebase.basevalue = polygons[j].edges[h].source.value;
} else if (polygons[j].edges[h].target.group === i) {
samebase.ribbons.push(polygons[j].edges[h]);
samebase.basevalue = polygons[j].edges[h].target.value;
}
}
subgroups[i].push(samebase);
}
}


here we set up a collection of subgroups for each group. For each group, we scan all polygons and check if it has a vertex in that group. In our polygon-based model, each subgroup will hold at most two ribbons, since for each corner in a segment, you only have to sides departing from it. Special cases are the segment and the single point, which have one and no ribbons respectively. Let’s us inspect our subgroups object so far:

> subgroups
[ [ { ribbons: [Object], basevalue: 5 },
{ ribbons: [Object], basevalue: 11 } ],
[ { ribbons: [Object], basevalue: 7 },
{ ribbons: [], basevalue: 0 } ],
[ { ribbons: [Object], basevalue: 7 },
{ ribbons: [Object], basevalue: 11 } ],
[ { ribbons: [Object], basevalue: 5 },
{ ribbons: [], basevalue: 0 } ] ]

we actually added the basevalue property; it tells us the size of the chord’s base. We can inspect closely the subgroups of the first group:

> subgroups[0]
[ { ribbons: [ [Object], [Object] ],
basevalue: 5 },
{ ribbons: [ [Object] ], basevalue: 11 } ]

there are two, one with two ribbon who share a base of size 5, and one with single ribbon of base 11 (that clearly belong to the segment, check again connections, you’ll find it in the third position of the array). We can inspect what are the ribbons involved, via

> subgroups[0][0].ribbons
[ { source: { group: 0, value: 5 },
target: { group: 1, value: 7 } },
{ source: { group: 0, value: 5 },
target: { group: 3, value: 5 } } ]
> subgroups[0][1].ribbons
[ { source: { group: 0, value: 11 },
target: { group: 2, value: 11 } } ]

Note that, again, we’re dealing with reference to connections’ components. Change one of them, and you’ll be writing directly into connections. We’ll use this in a little while.

Ehy! I was about to forget: the single-point connections. We need to add them too to the subgroups, since they have to be taking into account when computing the position of the graphical elements. Here we go:

i = -1; while (++i < connections.length) {
if (connections[i].length === 1) {
subgroups[connections[i][0].group].push({
'ribbons': [],
'basevalue': connections[i][0].value
});
}
}


you can check that one of the no-ribbons subgroups (the one from group 0, check again connections and you’ll seen that’s correct) has been updated to have a basevalue of 42:

> subgroups
[ [ { ribbons: [Object], basevalue: 5 },
{ ribbons: [Object], basevalue: 11 },
{ ribbons: [], basevalue: 42 } ],
[ { ribbons: [Object], basevalue: 7 },
{ ribbons: [], basevalue: 0 } ],
[ { ribbons: [Object], basevalue: 7 },
{ ribbons: [Object], basevalue: 11 } ],
[ { ribbons: [Object], basevalue: 5 },
{ ribbons: [], basevalue: 0 } ] ]

we now use the range function, defined before, to generate the array of arrays subgroupIndex, in case we need to sort subgroups; it’s inessential to the scope of this article, but I introduce this to keep the naming consistent with the implementation on my github branch.

i = -1; while (++i < ngroups) {
subgroupIndex.push(range(subgroups[i].length));
}


now everything is in place to find out the geometry of our diagram:

var pt, pt1, pt2, groups = [];

var x = 0, i = -1; while (++i < ngroups) {
var di = groupIndex[i];
var x0 = x, j = -1; while (++j < subgroupIndex[di].length) {
var dj = subgroupIndex[di][j],
v = subgroups[di][dj].basevalue,
a0 = x,
a1 = x += v * k;
h = -1; while(++h < subgroups[di][dj].ribbons.length) {
pt1 = subgroups[di][dj].ribbons[h].source;
pt2 = subgroups[di][dj].ribbons[h].target;
if (pt1.group === di) { pt = pt1; }
else { pt = pt2; }
pt['geometry'] = {
index: di,
subindex: dj,
startAngle: a0,
endAngle: a1,
value: v
};
}
}
groups[di] = {
index: di,
startAngle: x0,
endAngle: x,
value: (x - x0) / k
};
}


we are scanning all our subgroups in the order defined by subgroupIndex and groupIndex; we then pick all the ribbons in a subgroup, select the right end of it, and augment that datum with the geometry property. Note that since we’re managing references, these changes propagates to all objects holding that reference, like if they were a sort of global variables. Here we modify the values via the subgroups array, later we’ll access those chord endpoints via the polygons arrays. Pretty cool. Let’s just check what ended up in the groups array

> groups
[ { index: 0,
startAngle: 0,
endAngle: 4.009250000000001,
value: 58.00000000000001 },
{ index: 1,
startAngle: 4.0592500000000005,
endAngle: 4.543125000000001,
value: 7.0000000000000036 },
{ index: 2,
startAngle: 4.593125000000001,
endAngle: 5.837375000000001,
value: 18 },
{ index: 3,
startAngle: 5.8873750000000005,
endAngle: 6.2330000000000005,
value: 5.000000000000001 } ]

and in the geometry property of our chord’s endpoints: here the outer layout of the object

> subgroups[0][0].ribbons
[ { source:
{ group: 0,
value: 5,
geometry: [Object] },
target:
{ group: 1,
value: 7,
geometry: [Object] } },
{ source:
{ group: 0,
value: 5,
geometry: [Object] },
target:
{ group: 3,
value: 5,
geometry: [Object] } } ]

zoom in a little:

> subgroups[0][0].ribbons[0].source
{ group: 0,
value: 5,
geometry:
{ index: 0,
subindex: 0,
startAngle: 0,
endAngle: 0.345625,
value: 5 } }

so far so good. We’re almost there; we have to explicitly build chords objects.

var chords = [];

i = -1; while (++i < polygons.length) {
j = -1; while (++j < polygons[i].edges.length) {
var source = polygons[i].edges[j].source.geometry,
target = polygons[i].edges[j].target.geometry;
if (source.value || target.value) {
chords.push(source.value < target.value
? {source: target, target: source, groups: polygons[i].vertices}
: {source: source, target: target, groups: polygons[i].vertices});
}
}
}


Done! recall that the property groups in chords element means: “the index of all groups touched by the polygon this chord belongs to”, and it will be used for the fading effect. It directly comes from polygons[i].vertices objects; since objects access is O(1) as opposed to O(n) for arrays (well, that’s kind of a mistery to me, arrays are objects in fact) and I need fast lookup, I encoded vertices as key of a dictionary where values are dummy empty strings. Let’s proudly inspect the result of our efforts:

> chords.length
5

as expected: a 4-sides polygon plus a segment. What does a chord look like?

> chords[0]
{ source:
{ index: 1,
subindex: 0,
startAngle: 4.0592500000000005,
endAngle: 4.543125000000001,
value: 7 },
target:
{ index: 0,
subindex: 0,
startAngle: 0,
endAngle: 0.345625,
value: 5 },
groups:
{ '0': '',
'1': '',
'2': '',
'3': '' } }

Here the resulting diagram, where I also added labels to the arcs for clarity:

Posted in Uncategorized | 5 Comments

## A new API to configure chord diagrams in d3.js

I really like the chord diagram from the d3 javascript library?, but the function used to build those diagrams could be better. Here I discuss a possible alternative.

Have you ever heard of circos? It’s a visualization tool written in perl; it draws pretty pictures arranging your data on a circle. It’s a powerful idea by Martin Krzywinski — unfortunately the perl program is quite difficult to use (an awful lot of configuration files) and the code isn’t that friendly to jump in and hack away. I always had the impression that only Martin can use it at full power; all we others can do is to copy and paste some of his examples, hoping that he wrote something similar to what you’re trying to do. Things might get better over time, since Circos is starting to be packaged for ubuntu? and some real programmer is putting hands on it. Still, for the time being, circos is not a viable option if you need a quick and meaningful way to explore your data.

“It’s a pity!” you’re saying? I did too, but cheer up! There is an alternative. It’s Mike Bostock‘s d3 javascript library.

Mike saw Martin’s work, liked it, and ported the table viewer example? to d3, giving us what he calls the “chord diagram“?. Mike is a good programmer, and got things right: he separated the low level drawing API, in d3.svg.chord?, from the higher level layout setting, in d3.layout.chord. While the drawing API lets you create every possible chord diagram you can think of, it’s pretty heavy to use; the layout API is much more straightforward, but it only implements something called “the table viewer layout”. My whole point is that this is far from the best possible use of this very nice visualization idea. Thankfully to this separation of roles in the code design, when I decided that the table viewing layout API doesn’t have the level of expressivity I need, it was relatively straightforward to write a new layout API; I just had to stick to the interfaces designed by Mike.

## The General Circos Idea

Here it goes: you have a circumference, which you split into several arcs. You then draw ribbons connecting arcs. The meaning you attach to this diagram is up to you; there are situations in which the semantics of those drawings is so natural that you immediately understand what is going on. You just get it. It’s a simple yet powerful visualization technique.

## The Table Viewer Thing

Even if Circos was designed with genomic data in mind, Martin suggests an application which is representing tabular data. It works like this: you have a square table full of numbers; each arc in the circle corresponds to a row and a column at the same time. A ribbon which have ends in the i-th and j-th arcs will have the values for its bases in the i-j and j-i entries of the table respectively. Here I try to explain this with a picture:

Frankly, I believe that such a representation is as messy as the initial table of data. Per se, the chord diagram is a very powerful visualization; this just isn’t its killer application. But I am very happy that it caught Mike’s attention, since whatever layout system you write to simplify the creation of chord diagrams, the d3 API that creates the basic SVG elements, such as arcs and ribbons, is already written and will not change.

## The Chord Diagram Matrix Layout

It is really useful to have a mechanism to configure a chord diagram with a simple javascript object, but the matrix input in d3.layout.chord just doesn’t cut it. Here three situations I can’t represent via the chord.matrix setter:

1. I cannot specify that two chords (or ribbons) share the same base.
2. I cannot set a portion of an arc to don’t have any chord departing from it. The matrix layout force the “same arc” chords to always exist; those chords correspond to values on the diagonal of the input matrix. I feel that a chord with same departure and arrival arc is redundant; make every pixel count, and just leave it off.
3. I cannot draw “parallel chords”, i.e. two distinct chords that both begin in a group A and end in a group B.

## Proposition: a Polygon Layout

My personal experience in drawing chord diagrams suggests a different approach, based on what I call “polygons”. If a chord is like a segment connecting two arcs, a polygon is a closed path of chords; a given polygon touches an arc only once. If a polygon represent a relation, or a property, then all arcs touched by the polygon share that property; the width of the chords involved in the polygon is a parameter that can carry additional information (i.e., “how much” of a property a given arc has). In the following pretty picture, I call the ensemble of chords showing up “a pentagon”:

A polygon is described by an array of vertices [v0, v1, ..., vn]; it’s up to the d3 engine to connect those vertices in the natural way, i.e. with the edges [(v0, v1), (v1, v2), ..., (v_n-1, v_n), (v_n, v0)]. A vertex is represented by a javascript object with two properties, “group” and “value”: the former is the ID of the arc where the vertex is, the latter is its width. Therefore the full configuration of a chord diagram is given by an array of polygons. As a side note, a polygon with only two vertices is a single chord (i.e., just a segment), and a “polygon” with only one vertex is a dummy placeholder in an arc, meaning “leave an empty space in this arc, i.e. without any chord”. To follow the d3.layout.chord convention, arcs are called “groups” in what follows; moreover, to pile up some more confusion, I’ll give the name “connections” to the above mentioned array of polygons.

Up to know I have shown several examples of chords diagram, but never specified what is the javascript object that actually drives the layout. Now I do exactly that; each diagram in this post have a number that I use here to refer to it. The javascript object below is for diagram #1, i.e. pentagon and triangle with two more empty arcs; one can clearly recognize the polygon (five elements array), the triangle (three elements array) and the remaining two “empty arcs”.

connections = [ [ {group: 0, value: 1},
{group: 2, value: 1},
{group: 4, value: 1},
{group: 6, value: 1},
{group: 8, value: 1} ],

[ {group: 1, value: 1},
{group: 5, value: 1},
{group: 9, value: 1} ],

[ {group: 3, value: 1} ],

[ {group: 7, value: 1} ] ]

Next is the descriptor for diagram #2, triangle with three bigger empty arcs interleaved, used as example for ribbons that share an arc. Sharing an arc is the very basic primitive for drawing polygons, and this API makes it very natural to use. The triangle is the three-edged polygon; the three other arcs with no ribbons are specified as single-element arrays in the polygon list.

connections = [ [ {group: 0, value: 1},
{group: 2, value: 1},
{group: 4, value: 1} ],

[ {group: 1, value: 2} ],

[ {group: 3, value: 2} ],

[ {group: 5, value: 2} ] ]

Diagram #3, used as an example for arcs that are partially empty; note that the empty space on arc 0 is specified via a single-edged polygon in the polygon list.

connections = [ [ {group: 0, value: 1},
{group: 1, value: 1} ],

[ {group: 0, value: 1},
{group: 2, value: 1} ],

[ {group: 0, value: 1},
{group: 3, value: 1} ],

[ {group: 0, value: 1},
{group: 4, value: 1} ],

[ {group: 0, value: 4} ] ]

Diagram #4, with “parallel ribbons”; note that each ribbon is a two-edges polygon on its own. I also added blank spaces on each arc.

connections = [ [ {group: 0, value: 1},
{group: 1, value: 1} ],

[ {group: 0, value: 1},
{group: 1, value: 2} ],

[ {group: 0, value: 1},
{group: 1, value: 3} ],

[ {group: 0, value: 1},
{group: 1, value: 4} ],

[ {group: 0, value: 1} ],

[ {group: 1, value: 1} ] ]

And here diagram #5, a pentagon with blank arcs interleaved:

connections = [ [ {group: 0, value: 1},
{group: 2, value: 1},
{group: 4, value: 1},
{group: 6, value: 1},
{group: 8, value: 1} ],

[ {group: 1, value: 1} ],

[ {group: 3, value: 1} ],

[ {group: 5, value: 1} ],

[ {group: 7, value: 1} ],

[ {group: 9, value: 1} ] ]

The final detail is the fading effect when you point your mouse on an arc. This is specific to Mike’s implementation of this kind of diagram, since the original Circos only produces a still image. Well, in fact is a general property that you can apply to every d3 generated graphic with little effort. Anyway, in the sample diagram? Mike choose to fade out all chords not departing from the arc under your mouse. With all this polygon thing, another strategy makes more sense: fade out all polygons not having a vertex in the current arc. In this figure you see diagram #1 with the pentagon fading out, since the mouse pointer is on one of the arcs of the yellow triangle.

In a forthcoming post I’ll describe the implementation of this alternative layout.

## bzr easter egg

If you use the bazaar version control tool, you probably know about the command that lists commands,

$bzr help commands add Add specified files or directories. alias Set/unset and display aliases. annotate Show the origin of each line in a file. [...]  Well, there is more. Some commands are meant to be developer-oriented or for debugging purposes, and are… hidden: $ bzr help hidden-commands
ancestry             List all revisions merged into this branch.
assert-fail          Test reporting of assertion failures
bundle-info          Output interesting stats about a bundle
[...]
rocks                Statement of optimism.
[...]


I am not going to spoil what bzr rocks does, but it made my day

## Trying to send a patch (with git)

This post isn’t titled “How to send a patch to an opensource project”, since I have no clue on how to do that. Recently I spotted a minor bug in BlueZ, and now I am doing my best to guess how a patch to the code is meant to be sent to the BlueZ project. First off, looking at their mailing list archive, it appears that pretty much everything happens via e-mails, like it is for the kernel or the git version control system. Which is: the mail exchange is basically made of patches in diff format that go back and forth between the developers, until the maintainers are fine with the result and apply them to their branches. Sometimes I wander when do the developers discuss *what* to write in their code, instead of exchanging and reviewing patches; maybe that happens at conferences or meetings. Anyway.

I used as reference the “How to Get Your Change Into the Linux Kernel” article; even if BlueZ isn’t the kernel, I guess most of the guidelines apply there too. I also had to have a look at the Linux kernel coding style, especially for the 8-char indents rule. As a “how to” guide, I found this article on the web that describes the commands step by step; actually, git is able to email a patch series all by itself, which is quite comfortable. First, I needed to

$apt-get install git-email in order to get the git send-email(1) command available on my system. ## Create a local branch: git clone, git checkout -b The first thing you need to do is cloning the tree you want to patch, and create a local branch. You will modify your local branch, not the ‘master’ one: that will serve as a reference to create the diff file, namely your patch (or patch series). $ git clone http://git.kernel.org/pub/scm/bluetooth/bluez.git
$cd bluez$ git checkout -b my_dev_branch

You can check you are in the correct branch with

$git branch master * my_dev_branch ## Modify and commit: git commit Now it’s time to make your changes, following all the conventions and coding style that apply to the project. Often there are guidances for commit messages too (I was asked to rephrase the message in my first submission into present tense), and on how to split a change into several distinct commits, to allow selective acceptance of single parts of a patch series.  git commit [edit, edit and more edit] git commit ## Produce the patch file(s): git format-patch Well done! now it’s time to produce the patch file(s) and send them to the mailing list. I used “git format-patch” with the -n option to get numbered patches, i.e. the subject of the emails starts with [PATCH 1/N], [PATCH 2/N], …, [PATCH N/N]. The BlueZ project doesn’t accept signed patches, so I leaved off the -s option: $ git format-patch -n master..my_dev_branch

One patch file per commit is created in the current directory. Here I specified the revision range in the form <refName>..<refName>, which is, all commits between the tip of the master branch (from which I forked my_dev_branch — that commit is the parent of the first commit of my patch) and the tip of my_dev_branch. I could have added --cover-letter to add one more file to send, containing an overview of the patch series, like this one. “Cover letters” have the subject starting with [PATCH 0/N].

## Send the patch as an email: git send-email

The patch files are ready. We can now send them one by one to the target mailing list. I sent it first to myself, just to see if everything is OK:

$git send-email --smtp-encryption=tls \ --smtp-server=smtp.gmail.com \ --smtp-user=gghh-at-domain \ --smtp-server-port=587 \ --to gghh-at-domain \ 000x-add-this-and-that-functionality.patch Apart from the mail address, the rest of the settings are for sending the patch from gmail; git-send-email manpage‘s example section is rich of tips to handle different configurations, and also to put all common settings in a configuration file. ## Apply the patch: git am Imagine that you are Junio Hamano, the maintainer of the git project. You have the final word on committing or not a patch to git. You receive tens of patches per day via email, and if you like them you have to apply them to your git branch of git (I am not sure this is an accurate description of the workflow for the git project, but let’s keep the narrative). This is done via the git am command. Since this is probably an important part of your daily activities, your email client stores message in a format compatible with git am like mbox or maildir, and have git am integrated in your email client, so that you press a button and the patch gets committed to your branch. But even mere mortals can (and should) know how to use git am: that’s how one can say if her/his patch will apply cleanly to the codebase. Anyway, let’s assume you got the email patch somehow on your disk as a file named 000x-add-this-and-that-functionality.patch; you can apply it to your tree with $ git am 000x-add-this-and-that-functionality.patch


Note that this is a commit; one can actually test a patch previously created with git format-patch by checking out to the master branch and running git am with the patch file.

As a side note, even if you haven’t the patch in your mailbox, but the archives are on gmane, you can still fetch the raw message appending /raw to the message URL. Say the patch is at http://article.gmane.org/gmane.comp.version-control.git/212429. You can get it and apply it to your tree by chaining git am to curl:

$git clone https://github.com/git/git.git$ cd git
$curl -s http://article.gmane.org/gmane.comp.version-control.git/212424/raw | git am -  There is still something that bugs me about the algorithm that applies a patch: I imagine that often the tree at the time when the patch is applied looks very different from the tree where the patch was generated. Think to the situation when a developer proposes a patch to the mailing list, and after the review process a maintainer applies the patch a week later. Lots of commits have happened in the meantime; what if a file the patch was modifying gets moved or removed? Well, the patch doesn’t carry any information about the tree state at the time of its generation. How can the patch tool know the right thing to do? Well, as Thiago Macieira pointed out to me on IRC, the patch tool is able to fix fuzz. In addition, if you use git am -3, it enables the 3-way merging by comparing the file indexes (an information that you can actually find in patch files). git am will try and find a commit that contained the file like that and then merge the changes since then with your patch. Note to self: investigate more on this! ## What if you made a mistake? git rebase -i We all do. In my patch to BlueZ I was particularly paranoid on the commit message; if the patch gets committed, that message will stay forever in everyone’s tree, and it will be the first thing that people see about your work. Moreover, think to the situation where the patch reviewer asks you to make some minor modifications to your patch: a tool to replay it all over and let you change small parts comes really handy. If you have to rephrase the message of the last commit, $ git commit ...
$git reset --soft HEAD^$ edit
$git add ....$ git commit -c ORIG_HEAD


credits to this stackoverflow post. If the bad commit isn’t the last one, you do

\$ git rebase --interactive <revision>


where revision is the parent of the first bad commit. It will allow you to “replay” everything, giving you the tree and commit message as you left them at each stage. Againg, found on stackoverflow. If you make mistakes even during the rebase (I swear I did), there is always git rebase --abort, and then you can start the rebase again.

## Function pointers and casts in C. Whazat?!?

I am wrapping my mind around function pointers and casts in C. This example amuses me; what do you think it will be printed on standard output? You can try it with your favourite C compiler. I am thinking if what I got from GCC makes any sense.

/**
* func_casting_weird.c
*/

#include <stdio.h>

typedef void (*two_args_fun_ptr_type) (int, int);
typedef void (*one_arg_fun_ptr_type) (int);

void one_arg_fun(int n) {
printf("Parameter: %d", n);
}

void higher_lvl_fun(two_args_fun_ptr_type f) {
f(2,3);
}

void main(){
one_arg_fun_ptr_type one_arg_fun_ptr = &one_arg_fun;
/* here the evil cast */
two_args_fun_ptr_type fake_two_args_fun_ptr =
(two_args_fun_ptr_type)one_arg_fun_ptr;
higher_lvl_fun(fake_two_args_fun_ptr);
}

Posted in Uncategorized | 6 Comments