ECIP 1067: Standardize OS Level Management of Signature Handling ("invoke-signer") Source

AuthorZac Mitton
Discussions-Tohttps://github.com/ethereumclassic/ECIPs/issues/147
StatusWithdrawn
TypeStandards Track
CategoryInterface
Created2019-02-07

Note

I will often use the term “Signer” to mean the same thing as “Wallet”, because these programs can sign more than just transactions.

Abstract

UX involved with handling ethereum signing is especially cumbersome. This is mostly due to the fact that dapps and wallets are used in many different environments (i.e. desktop, mobile, CLI, plugin, mobile-browsers desktop-browsers, and hardware). Attempts to improve this UX shortcoming have generally involved rigid user work-flows, or specific application combinations. I assert the problem is best solved at the operating system level using a widely standardized protocol like the one proposed below.

The following describes

  1. A way for dapp developers to handle any action requiring a signature (i.e. signing a tx or logging in). The spec defines how the wallet is chosen and invoked by the user’s Operating System
  2. A way for wallet applications to handle incoming requests for signature.

Motivation

Users should be able to use any wallet of their choosing when interacting with a dapp. Having users download a specific wallet per-environment reduces security, inhibits adoption, and is terrible UX: The user has to “top-up”, and not forget about each application-specific wallet. This sucks for the user because they have money all over the place and a laundry list of wallet program installations across devices.

The current architecture leads to staments like “Our dapp currently works on the chrome browser with metamask enabled. We plan to add support MyEtherWallet soon.”

As the number of wallets and dapps grow, the combinations become n^2. This will inhibit small wallets from entering the market, but require users to download all major wallets.

Current Scenario

Desktop Browser Dapps

So far, the best solution has been to download wallet plugins like Metamask, that inject javascript (by being granted very scary permissions) onto every single page the user navigates to. The user often cannot use a hardware wallet unless the dapp devloper specifically integrates support for it - a huge security limitation.

Mobile Browser Dapps

If the user attempts to load the dapp on mobile, they are expected to use a specially made “web3 enabled” mobile browser (i.e. Brave, Toshi), to view the app so the dapp has available it’s in-browser wallet for invocation. This is suboptimal architecture, and a waste of engineering resources. Browsers are extraordinarily complex programs, and building a decent one is an overwhelmingly monumental challenge (unless say, you’re the inventor of Javascript). UX is much better when they can use their preffered browser. Besides - we should not be building web-browsers to solve the problem of wallets.

Native Dapps (the worst UX of all)

Currently, if a mobile app developer wants to integrate an ethereum feature in their app, they genreally build a wallet into the app itself (from scratch). The user ends up with loose change in siloed apps, and these wallets will have widely varying (and therefore pathetic) security standards.

Desired Dapp Experience

Before defining the proposed spec, I’d like to outline the ideal user experience, and work backwards to achieve it. After all, thats how this spec came about.

  1. If a dapp is browser-based, the user can browse to it using ANY available browser on mobile OR desktop.
  2. If the dapp is a native mobile or native desktop app, they simply download then open the application.
  3. When the dapp requires a signature it should
    • Automatically open the user’s preferred mobile wallet when on mobile, or
    • Automatically open the user’s preferred desktop/hardware wallet when on desktop.
  4. The wallet should then display details of the signing request.
  5. The user can then tap or click “sign” and be sent back to the dapp.

Rationale

This problem finally must be addressed at the Operating System level, because it is the only way to flexibly hand off control flow to another app.

Deep linking is a method available from any application or webpage that provides the user’s operating system with instructions to open a specific application (and can carry an arbitrary data string with it). They work in basically all environments: Android, iOS, MacOS, Windows, and Linux (possibly others).

Most of us have seen deep linking used when clicking on a Zoom or Spotify link. It usually invokes focus to a specific application. In our case, we don’t want to open a specific (branded) application (i.e. Jaxx, Toshi, or Gnosis-Safe). Rather we tell the operating system to open the user’s “default signer” (wallet).

Implimentations

IOS and OSX

Implementation of signer apps varies by OS. On IOS it can be done with NSUrl Protocol, which is like a link, but instead of a opening a specific app, you specify a name space (e.g. “invoke-signer”) to which any app can register itself as a handler.

All apps on the device registered as a handler will call a boolean function deciding whether or not the app should handle the incoming request (perhaps true if (and only if) the user has chosen this signer as default). The first app to return true from this function will be launched with access to data from the URL (the rpc data and/or more).

Draft Specification

The name space should indicate fundamentally that its doing the signing. So ethereum-signer was my first though. However, nearly any cryptocurrency can benefit from this. For that reason simply signer would be better. Lastly this spec is about more than just a referring to the signer app - it is about specifically invoking that app. Therefore I think invoke-signer is the most relevent namespace to use.

From there, the path could communicate the specific cryptocurrency, i.e. ethereum-classic (together invoke-signer://ethereum-classic…). The path can be used by the singer in its boolean function to determine whether it’s capable of handling this particular signing request. Then we need a way to tell the signer exactly what to do:

From there we need to communicate exactly what the signer should do. Rather then re-inventing the wheel, we can just uri_encode the existing JSON RPC calls. For instance all signers would want to support the existing RPC methods eth_sign and eth_signTypedData.

In standard query format this would be:

invoke-signer://ethereum-classic?rpc=%7B%22jsonrpc%22%3A%222.0%22%2C%22method%22%3A%22eth_sign%22%2C%22params%22%3A%5B%220xc45bc213664f565324ad302d187e0dc08ad7d1c57%22%5D%2C%22id%22%3A67%7D

Over time signers should become the secure and predictable place to view blockchain information, espesially data about to be signed. We can not really trust data being viewed on brower-based explorers or new dapp websites. More advanced RPC methods will be added over time to accomidate this. For instance, methods that pass contractCode and compilerVerion, will be able to verify the code and ABI in the signer app. But none of that needs to be defined right now, because the format accepts arbitrary future methods!

Conclusion

New User Experience

Most users will have a favorite mobile signer and enjoy a consistent experience with every dapp. Some may still use a plugin like metamask when on desktop, and power users will use a hardware signer for ultra-high security as needed.

Developer Experience

It becomes much easier to add a single “ethereum feature” into any software application. The app dev does not need to worry about any of the details of building a wallet/signer. They just create a string to be signed, and invoke the user’s signer with it. The user expectation is only that they have a signer, as opposed to some specific signer.


Withdrawal Notice:

This approach currently has a flaw that the name of the caller application is not sent to the invoked wallet, so handing control back to the caller app is very difficult. This can be worked around in an app-to-app setting (by passing it as part of the payload), but not in a browser-to-app setting (without complicated “browser sniffing” techniques).