Very Secure

Block Explorer Progress - What's Done, What's Next

Block Explorer Progress - What's Done, What's Next

I) What's Done

I've taken jfw's gbw-node and repurposed it to serve as a block explorer. The functions I'm left with after my changes are view-block,1 view-txn-by-hash/view-txn-by-pos,2 view-ancestors/view-descendents,3 view-address4, balance,5 utxos,6 and push.7

Per jfw's suggestion, I merged the input table into the output table in the sql schema. I renamed the output table as output_input. Thinking of the output of one txn and the corresponding input in the txn that spends that output as being a single structure has helped me form a clearer picture of how chains of bitcoin transactions are connected.

Here is a snapshot of the current sql schema with the new "output_input" table.

--- Gales Bitcoin Wallet: node (online component) schema
--- J. Welsh, December 2019
--- Dialect: SQLite (3.7.0 for WAL)                                                                                                                                                                                                          

PRAGMA journal_mode=WAL;
BEGIN;

CREATE TABLE block (
       block_id  INTEGER PRIMARY KEY,
       height    Integer NOT NULL,
       size      INTEGER NOT NULL,
       version   INTEGER NOT NULL,
       prev_hash BLOB NOT NULL,
       hash      BLOB NOT NULL,
       root      BLOB NOT NULL,
       timestamp INTEGER NOT NULL,
       target    INTEGER NOT NULL, -- Should we include this?
       nonce     INTEGER NOT NULL
);
CREATE UNIQUE INDEX i_block_hash on block(hash);
CREATE UNIQUE INDEX i_block_height on block(height);

CREATE TABLE tx (
        tx_id    INTEGER PRIMARY KEY,
        hash     BLOB    NOT NULL,
        block_id INTEGER NOT NULL REFERENCES block,
        pos      INTEGER NOT NULL, --pos in block
        comment  TEXT,
        size     INTEGER NOT NULL,
        fee      INTEGER
);
CREATE INDEX i_tx_hash ON tx(hash);
CREATE UNIQUE INDEX i_tx_block_id_pos ON tx(block_id, pos);

-- Every input begins its life as an output.
CREATE TABLE output_input (
        output_input_id INTEGER PRIMARY KEY,
        creating_tx_id  INTEGER NOT NULL REFERENCES tx,
        out_pos         INTEGER NOT NULL,
        address_id      INTEGER NOT NULL REFERENCES address, -- aka script pub key
        value           INTEGER NOT NULL,
        spending_tx_id  INTEGER REFERENCES tx, -- If null, it hasn't been spent.
        in_pos          INTEGER, -- position in input vector in spending txn
        scriptsig       BLOB,
        flags           TEXT
);
CREATE UNIQUE INDEX i_output_txid_out_pos ON output_input(creating_tx_id, out_pos);
CREATE INDEX        i_output_addrid ON output_input(address_id);
CREATE INDEX i_input_txid_n  ON output_input(spending_tx_id);

CREATE TABLE address (
        address_id INTEGER PRIMARY KEY,
        address    BLOB NOT NULL
);

CREATE UNIQUE INDEX i_address_address ON address(address);

CREATE TABLE state (
        scan_height INTEGER NOT NULL DEFAULT(-1)
);
INSERT INTO state DEFAULT VALUES;

COMMIT;

II. What's Next

A) Refactor commands so that they can be used by both the command line and web interface.

I've decided to use flask to run the web server portion of the block explorer. From reading the logs, this python package appears to be a handy utility whose use is a mortal sin. So I don't want to make flask's installation a requirement for running the block explorer locally.

I plan to do the following. I'm going to design the block explorer so that I can run a public web interface using flask. The source of everything will be public, so anyone will be able to install the block explorer along with flask and use the explorer locally via the web interface. Alternatively, they will be able to install the explorer without flask, but in this case they will only be able to use the block explorer via a command line interface similar to the one gbw-node currently employs.

In order to allow for the two uses of the explorer, I need to split all the command functions into two parts - one that returns structured data8 and the other that prints the structure data. The command line interface and web interface will stringify the data appropriately.9

B) Write a view-raw-hex of block command.

As an exercise in understanding and in order to check the integrity of the explorer's stored data, I want to make sure that I can take the tables in the gbw-node sql database and reconstruct a bit-perfect block.10 In order to provide this feature I need to store some data considered extraneous by the original gbw-node wallet, such as the input field for a coinbase as well as a transaction's sequence number, version, and locktime.

C) Get domain names and configure servers.

I now have one box currently syncing trb on asciilifeform's rack.11 But setting up at least one other mirror in a different geographical location seems prudent.

D) Continous trb scanning.

Currently gbw-node has no way to handle reorgs. It pulls data from the bitcoin rpc up until the 'block height - CONFIRMATION'th12 block. This is done via the command "scan", which halts when it reaches the most recent block. To keep the explorer's data up to date, the block explorer must always be scanning. I can either modify the scan command to run on an infinite loop, sleeping for ~10 mins when it hits the max block height, or I can just continually rescan via a crontask.

E) Provide a way to show information about transactions in the mempool / recent blocks.

The main use cases I have for a block explorer are obtaining utxo data for spending bitcoins, pushing raw bitcoin transactions to the network, and confirming that recently pushed transactions were received by the network. The block explorer in its current state has no way to store transactions in the mempool. The schema requires a transaction to have an associated block id and block position. So currently the block explorer is not useful for showing recent blocks, nor for showing recent unconfirmed transactions.

I plan to create a separate table, mempool_transaction, that displays information about transactions in the mempool. The scan function will delete mempool transactions whenever it finds the transaction successfully placed in a deep block.13 I also will want to figure out how to store recent blocks that may be reorg'd. I think that I'll handle this in a similar manner to mempool_transaction, with some volatile table named recent_blocks. The corresponding row from this table will be deleted when the block has confirmed its place in the explorer's "main chain."

  1. Show's the information from a block's blockheader. []
  2. Both of these functions display the same information for a transaction. But one lets you search for the transaction by providing the block height and the position of the transaction. The other lets you search for a transaction by its hash. []
  3. These two functions return the txn hashes for every single txn in a transaction's ancestor tree, all the way up to the original coinbases, and the descendents of a transaction, all the way down to the current UTXOs, respectively. []
  4. Displays all transactions where the given address either creates or consumes a UTXO. []
  5. Displays the bitcoin-denominated balance of an address []
  6. Displays all the unspent transaction outputs for an address. []
  7. Sends a raw hex txn to the network. []
  8. I think I'll use python's class system, and then create and return immutable objects. []
  9. \n's for command line <b>'s + links for the web interface, etc. []
  10. It also seems prettty basic to me that a block explorer should be able to return a hexadecimal representation of a block, yet afaik none of the heathen ones provide this simple feature. []
  11. It seems like half of the network resides on asciilifeform's shelf... []
  12. Defaulting to 6 confirmations []
  13. I have not yet concluded what I should do with dust transactions that start to fill up the mempool. []

8 Responses to “Block Explorer Progress - What's Done, What's Next”

  1. > It seems like half of the network resides on asciilifeform's shelf...

    May well be true, in re TRB; and IMHO this kind of thing is not healthy. (And e.g. BingoBoingo is working on standing up geographically dispersed nodes, last I recall. So hopefully will be remedied.)

    But it does seem to be the case that every subscriber to date starts off with "I have a box, so why not a TRB node?" and I'm not in the biz of telling them not to. FWIW on avg. a TRB eats ~100kB/s. in both directions; they are not a palpable bandwidth sink.

  2. whaack says:

    It certainly doesn't seem healthy and it is something that has to be actively fought against since the convenience/integration/price offered by your service makes going to another ISP a relative pain in the ass. In any case I'm going to see if there are any datacenter in CR where I can colocate a box.

    And speaking of convenience, have you considered offering a box to come with signed tarball of blk000*.dat data to speed up the sync? This is data I would pay for, I would like to avoid waiting 30-60 days before I can test my block explorer on an up-to-date node.

  3. whaack: I'll gladly give folks a copy of blk* from either of my nodes, but historically do not like doing so -- arguably it is even more "ecologically dirty" than "tempting" people to park 9000 nodes inside one cabinet. The risk is that TRB bring-up could be broken/dysfunctional and the folks using the "cheat" won't know.

    Arguably this has already happened, to an extent: I personally have not synced a node "from empty disk" for several years, and thus not given the slow/bumpy sync mechanism the attention it deserves.

  4. whaack says:

    I've been thinking about this and it seems to me to be a matter of "it's better to be right than to be principled." If there are actually only a small number of people in the world who are concerned about making sure that there are no holes in the signed transaction graph from coinbases to utxos then it seems prudent to bypass the slow and clunky trb sync process and get full nodes running in as many places as possible by whatever means necessary - i.e. flying hard drives all over the map. It's not like we have to pick between one of two strategies. We can have some nodes syncing with the "traditional" method while others are speed-boosted.

    P.S. I've already had to restart my node on your rack twice. Once it shutdown by itself, for unknown reasons (and I did not save the debug.log file, a mistake I'll try to avoid in the future.) The second time bitcoind was stuck on block 190919 for a couple of hours. It seemed to be running just fine but...no progress. So I kill'd it and started it back up again and it seems to be syncing slowly but surely.

  5. Entire machine restarted?! please let me know right away (and save the system log!) if happens again.

    Or was this strictly TRB ?

  6. whaack says:

    Strictly trb, sorry.

  7. Diana Coman says:

    It seems like half of the network resides on asciilifeform's shelf.

    You know, if you count what's in a bubble, then there's no surprise that it's all within ...that bubble, yes. It's not half of any network, it's just that bubble (for all it may seem as oh no, there isn't nor could there possibly be if "we" don't know about it) anything else or anything more.

  8. whaack says:

    @Diana Coman

    The solipsistic "the world consists of what I can see" is evident on reread.

Leave a Reply