Very Secure

V Study Part 1 - Vpatches and Vdiff

Creating source using V is done by sequentially applying a set of vpatches through a process known as pressing. To press, V is given the most recent vpatch and an output directory. V then finds a path from the given vpatch to the genesis vpatch. Starting with the genesis vpatch, V applies each vpatch along the found path and dumps the result into the given output directory. In this post I go over how the vpatches used in this process are created.

To make a vpatch, a developer starts with a copy of the source already pressed to the previous most recent vpatch. We'll say for example this source is in a directory named oldversion. The developer then copies the source in oldverison to another directory that we'll call newversion. In the directory newversion he makes the source modifications that will constitute the vpatch. When finished, the developer runs

vdiff oldversion newversion

An example of the code for the vdiff program, taken from the bitcoin foundation, is reproduced in one line below:


diff -uNr $1 $2 | awk 'm = /^(---|\+\+\+)/{s="sha512sum \"" $2 "\" 2>/dev/null  " | getline x; if (s) { split(x, a, " "); o = a[1]; } else {o = "false";} print $1 " " $2 " " o} !m { print $0 }'

Running vdiff on the two directories creates the vpatch file, which is similar to a diff file obtained from running

diff -uNr oldversion newversion

The difference is vdiff replaces vanilla diff's file modification timestamps with hashes1 of the file's content. spyked articulates the importance of this in a recent thread he had with me in #o.

spyked: whaack, problem is that classical diff/patch leave room for ambiguity, i.e. in principle it's possible to (cleanly) apply a single hashless patch to different files, which results in different presses. so hashes are needed in order to identify the file (not only path/name, which is only metadata required for retrieval) as it is before/after applying the patch.

I still need to fully digest the awk command that is replacing the file timestamps with the file content hashes. But one quirk I noticed was that certain crafted files would cause the awk command to incorrectly match on certain lines. For example if you have


$tree
.
├── newversion
│   └── fool.txt
├── oldversion

$cat newversion/fool.text
++ trick.txt this_should_be_in_the_vpatch2

then

$vdiff oldversion newversion

will produce


diff -uNr oldversion/fool.txt newversion/fool.txt
--- oldversion/fool.txt false
+++ newversion/fool.txt 27991f54fb2534c59b6c0667f9a91d8bd9173b5cc3184aeea251c2435b7808457a5492add5646793738a1f3e9c32892a2261e18eb0e3a2d0d7a0486735bf43a8
@@ -0,0 +1 @@
+++ trick.txt false

the last line should be

+++ trick.txt this_should_be_in_the_vpatch

but it was mistakenly altered by the awk command. This incorrect modification to the vpatch makes the resulting fool.txt file have the wrong contents after pressing.3 However if, while pressing, V checks that the hashes of the resulting files match the intended files hashes found in the vpatch, V will correctly spot this error and fail to press. This gives an example of how diana_coman was right when responding to my point of confusion here

whaack: got it, i understand that the hashes are needed to identify the files. but regarding hashing the files yourself after every patch, the vpatches already let you know what the output hash will be. so if you trust the vpatch to the point where you're going to run the code outputted by it, then you should trust its claim of what the output of the hash would be. hashing the output files yourself after every patch then becomes more of a
whaack: data-integrity check.
diana_coman: the vpatches let you know what the output hash *should be*
diana_coman: nobody can let you know upfront what it *will be*; in general

  1. Originally, the hash function used in vdiff was sha512, as I have in the vdiff program I posted. Now the hash function used is keccak. The benefits of using keccak over sha512 are beyond the scope of both me and the post. []
  2. Note that I put two +'s at the start of the one line in this file. To show this line was added, the diff command's output will contain a "+" followed by the line's contents. This will cause there to be a line in the diff output with three sequential +'s that refers to a file's content. The awk command will incorrectly match to this line and attempt to replace this_should_be_in_the_vpatch with the hash of the non existent file trick.txt. []
  3. "this_should_be_in_the_vpatch" was replaced with the word "false" because the hash of the file "trick.txt" does not exist. []

3 Responses to “V Study Part 1 - Vpatches and Vdiff”

  1. Diana Coman says:

    "To make a vpatch, a developer starts with a copy of the source already pressed to the previous most recent vpatch." - this is not strictly true, no, there is no requirement (and in some cases no reason either) for a V-tree to be just a V-line. To make a vpatch, all you need to start with is either an empty directory (when you create the genesis vpatch - that's a vpatch too!) or the result of a previous press.

    Other than that, the text clearly benefited from proofreading but it's ending quite abruptly and unexpectedly - what were you trying exactly to say with this? The title seems to promise a discussion of vpatches at the very least, but the above fails rather short of it and sounds more like "here's my first attempt to play around with vdiff and vpatches". If this is what it is, it's fine but don't confuse it for "I know what vpatches are" and moreover, do structure your investigations better after the initial play around.

  2. whaack says:

    1) I acknowledge I did not note the case for the genesis patch. And, as you say, vdiff does not require that the source comes from a vline. However it is not clear to me why one would create a patch (other than a genesis patch) where the previous source was not the result of a previous press. Would it ever make sense to create a vpatch for heathen code without first creating a genesis? To me the answer is a clear no.

    2) (on proofreading) Yes I apologize for wasting your time with my sloppy previous post. For my my interests post i spent hours agonizing (or "spinning") over wording, and I realized I cannot operate that way if I want to get anything done. But I went too much in the other direction with this post and in haste published something incomprehensible. I am still looking for balance on how much time I spend revising while I build my blogging muscles.

    3) The goal of the post was to demonstrate my understanding of the creation of vpatches "In this post I go over how the vpatches used in this process are created." I should have changed that sentence to s/"go over"/"go over my current understanding of" I also should have made the intent of the post clear in the title. Lastly, the post is a little over the place in part because I felt the need to include some context about V to discuss the creation of vpatches.

  3. Diana Coman says:

    1. "Would it ever make sense to create a vpatch for heathen code without first creating a genesis?" - Not really (other than as exercise) but that's *not* at all what I was saying there, no.

    The genesis vpatch is the root of a V-tree. The name is not chosen randomly: it's a *tree* and that means that there are no cycles in it but also that each node can have 0 or *more* children. Now read my 1st sentence again and especially the part "there is no requirement (and in some cases no reason either) for a V-tree to be just a V-line." A V-line means strictly one child from each node, from genesis down, no ramifications. This flat V-line structure is what you get if you restrict pressing to "the most recent" aka "head" in v-parlance. There is no such requirement: you can press a new vpatch on top of *any* existing vpatch, anywhere in the tree (and that's how you get ramifications).

    2. There is a time-tested way and no need for you to reinvent it. It just goes like this: write first *without* any agonizing over it, it's just a DRAFT. Then you re-read that and correct obvious mistakes, misspellings etc. After *some time* (preferably at least 1 hour), you re-read again and correct further, esp for clarity and structure if needed. Only after this (and as a minimum really), you publish.

    3. See at 2 above; that 2nd review should have rearranged parts and improved matters.

Leave a Reply