Reputation: 1124
I am developing a web application where two players interact through a smart contract on the Ethereum blockchain. The application's workflow is as follows:
To ensure fairness, it's crucial for the second player to verify that the deployed contract is the agreed-upon version, without any modifications.
To address this, I implemented a bytecode comparison mechanism based in getBytecode
function, which uses eth_getCode
. However, I've encountered a limitation where nodes only retain bytecode information for a recent number of blocks.
As a fallback, I started using the Etherscan API to retrieve the verified source code of contracts. Unfortunately, I've noticed that querying the API too soon after the contract deployment can result in it matching an incorrect contract (!!). This raises concerns about the reliability of using the Etherscan API for this purpose and the potential risks of depending on it.
Upvotes: 1
Views: 39
Reputation: 43481
nodes only retain bytecode information for a recent number of blocks
This is correct for nodes with the pruning option enabled.
Nodes without pruning retain the bytecode of all contracts.
This option is configurable by the node operator. Non-pruned node requires multiple times more disk space, and that's one of the reasons why they're less common. Having said that, most 3rd party node providers only provide pruned nodes in their free and lower tier plans, and non-pruned (or even archival - with all historical states) nodes are usually in their higher tiers.
Aside from the centralization risk, is Etherscan API for retrieving the source code a suitable tool for this kind of task?
As you mentioned, a 3rd party API can return similar results even though you're looking for exact match.
If you have the expected bytecode, it's often safer to compare to the exact bytecode retrieved by a node.
If you still want to use the 3rd party API comparing source code, make sure to compare that the source code matches exactly the expected input. Although there might be false negatives for example if the origin code uses tabs and the compared uses spaces, or if the author of the copy decides to add a comment that is not present in the original code, ... Since bytecode is "just" binary, tabs/spaces and comments don't make a difference in the resulting binary.
Is there any common knowledge or approach for this that I am ignoring?
I'm not aware of any specific common knowledge or other aproach. But one more note on validating the contract origin:
Anyone can deploy the same contract but pass it different constructor parameters. They are present in the init bytecode (passed to the deploying transaction) - but not present in the runtime bytecode (stored in the EVM after the contract is deployed). eth_getCode
returns the runtime bytecode.
If this is something that your app might be vulnerable to, you might want to consider a factory pattern to deploy the contracts:
A factory contract contains a function that deploys a specific target contract to a new address.
User invokes the factory contract instead of deploying the target contract directly
a. New instance of the target contract is deployed on a new address (through the factory) b. The factory stores the address of the newly deployed target
Your app validates against the list of valid target contracts stored in the factory (not against a bytecode nor any other 3rd party source)
Upvotes: 0