Google bitcoin miner - business your
The Blockchain — Preparing the training data
Google has made several products aimed at automating and improving Machine Learning training and provisioning available on their cloud platform. The latest of these — AutoML Tables aims to “Automatically build and deploy state-of-the-art machine learning models on structured data”.
Ever the fan of Machine Learning and automating-all-the-things I thought I’ll take it for a spin and afflict it on another technology that makes headlines on its own — Bitcoin.
The idea is simple — Bitcoin “mining” is the process of trying to guess the correct combination of strings which will, once hashed — generate a valid block. The byproduct is the processing of transactions, and the reward is a trove of Bitcoins.
So why apply Machine Learning to it? well, the current method of mining is essentially “brute-forcing” the intensive calculations, trying to guess the right inputs to the algorithm. However, one of the key features of Bitcoin is the Blockchain — a complete record of every block mined to date, along with the input parameters used to mine it.
In fact, the Blockchain contains half a million examples of “the problem”, along with the solution. In the Machine Learning world they call this — “a training dataset” :)
AutoML Tables allows the user to upload a large dataset arranged in a table — either a BigTables dataset or even simpler — a .csv file.
As stated earlier mining a Bitcoin block is the process of applying a hash algorithm (twice) to a set of inputs parameters. These parameters, when hashed, make up part of the “Block Header”, which must start with a number of leading zeroes, determined by the difficulty. Once the resulting hash is valid it is stored in the Blockchain for all to see (and validate). It is an 80 byte string comprised of the following (click image for more):
Add description
Apart for the Version, Bits (target) and hashPrevBlock which are fixed, the other parameters are for the miners to decide and increment when hashing. However, when mining cooperatively with others — the Time and MerkleRoot are also decided by the mining pool and all that’s left for us is to increment and guess the Nonce. Therefore it will act as our target and the output for our Machine Learning model, which is great because AutoML Tables can only have one column set as a target.
To save that data into a .csv file I launched a full Bitcoin node, and after waiting for it to sync with the network (and waiting and waiting..) — ran `bitcoin-cli getblock <latestBlockHash>` , did some cutting and saved to file, then ran it again with the previousHash as input and so iterated the Blockchain backwards. The result looks like this:
Add description
It is important to add the header line for AutoML Tables to pick-up the column names.
The first step is importing our training data to AutoML Tables. Once done, we are presented with the “Schema” screen where we select the “Target” for our our ML model — the field it will output.
Add description
AutoML can also assist in extracting a validation set from the data, applying a weighted column, and even handle time-series data.
But wait! already we can spot a problem — AutoML detects column data types automatically and has picked up several fields as “Categorical”. This is great, but in our case as the values are unique there will be as many categories as there are training rows, and live data will never match any category!
And if we proceed to the Analyze page we can see too many categories were picked up (the cake is page 1 of 4..)
Add description
This is actually a great feature of AutoML Tables — it allows us to spot problems with our data before we even start training. We can see we have many distinct values (no duplicate rows), and the correlation of each field with our target, which in our case is low and will yield poor results.
Due to the low correlation of the important fields, when we move on to the training phase we select to omit the more static columns so our Model can focus on the important ones:
Add description
I proceed to train the model, and results are as expected — insanely high Mean Average Error, meaning our model will not produce any meaningful results.
In light of the above I attempt to modify the training data to non-categorical values. This will also allow us to train a model locally using Tensorflow + Keras for comparison.
Tensorflow expects the inputs to be in the form of a Bytearray — values ranging from 0 to 255. I split the field of the Blockchain, byte by byte, and the result looks like this:
Add description
Our data now has 76 fields, each representing 1 byte of the Block Header, with the last four bytes being the last numeric field — our target, Nonce.
Import the data to AutoML Tables again, and it’s picket up as Numeric:
Add description
Looks promising, and the correlation-to-target is spread evenly across all columns. Unfortunately, training yields poor results once more:
Add description
Huge error rate, high Mean Average Percentage Error (lower is better).
Were the poor results a fault in AutoML Tables, or simply the complexity of the training data which cannot be learned by an AI? To verify I turn to good old Tensorflow & Keras to do some local training.
Tensorflow expepects the data in the form of a Bytearray, as we did in the second attempt. This is how it was prepped -
Add description
We train a fully-connected Neural Network, outputting 4 bytes which we will concatenate to form our Nonce -
Add description
The output from just a few epochs of training already tells the same story:
Add description
Notice how even though the Mean Absolute Error is not as high as in AutoML Tables, the Accuracy is “stuck” at 0.2757 .
This is no coincidence. As we are outputting exactly 4 bytes this accuracy score can only mean the NN is unable to learn and predict correctly.
The cause? either the network is not large enough — which is unlikely as we would have seen at least SOME improvement between epochs,
Or the data contains no detectable correlation between inputs and target.
I guess that’s that! Pretty amazing to see how a simple SHA256 with sufficient “salt” can produce enough entropy to make the Blockchain truly robust and uncracable!
At least until we bring out the big guns… Stay tuned for round 2 — Cracking Bitcoin with Fujitsu Quantum Annealer !
Dror Gensler

0 thoughts on “Google bitcoin miner”