I’m currently looking into adapting the ‘tag’ URI ( http://www.taguri.org/ , https://tools.ietf.org/html/rfc4151 ) to use for distributed reputation systems. I’m hoping to end up with a common protocol which can easily express: “Authority A says that X and Y are the same person” (with an optional field for the certainty of that statement) (with an optional comment field) (with an optional authentication hash). The existing tag URI does a fine job allowing for expressing what A and X are, and seems the ideal base for the remainder. It also seems extremely simple to be able to implement the upgraded ‘tag’ in existing mail/contact software, so that tag: links can be automatically imported into contact lists to update them.

Here’s an initial draft of what I mean:


This includes some changes from current format of ‘tag’:

The authority can be not only an email address or a domain name, but some other identifier, such as the URL of a social media profile, such as http://twitter.com/DataPacRat or http://www.facebook.com/DataPacRat . (Worth discussing: whether anything more complicated than an email address should be enclosed in quotes, as URLs are in the ‘a href’ tag in HTML.) (Part of the reason for this is to allow chains of authorities – so that http://twitter.com/DataPacRat can be used to authenticate http://twitter.com/Example , with trust-measurements for each step.)

The date-stamp is no longer just a date when the authority-identifier is under control of the authority, but is also the moment when the authority makes its assertion about the identities.

The date is no longer limited just to a particular date, but can be specified (using ISO 8601 format) down to a particular second, for any rapidly-changing identities.

The identities can be plaintext names, email addresses, domains, social media profile URLs, library card numbers, or any other chosen identifiers.

The comment fields are left incompletely defined, to allow for future expansion; such as to indiciate trustcloud whuffie scores, how the authority knows the individual being identified, or anything else that someone comes up with.

By default, if the comment field is a number, that number is assumed to be how confident the authority is that the listed identifiers all refer to the same individual, measured in decibans. (Decibans are logarithmic, with 0 decibans being equivalent to 1:1 odds, or being 50% confident; 10 decibans to 10:1 odds, or ~90% confident; 20 decibans to 100:1 odds, or ~99% confident; and so on.) It is recommended that these numbers be integers, unless there is a specific reason to be able to specify confidence to greater accuracy; and with a magnitude under 128, as it requires extraordinary effort to have 100 decibans of confidence for even the most fundamental facts.

(This method also implicitly allows for revocation of an identity referral, simply by using a negative number instead of a positive one.)

The authentication hash is to provide strong evidence that the listed authority is actually the one making the assertion. By default, it is assumed to be based on whatever public cryptographic key (eg, PGP/GnuPG or X.509) is linked to the listed authority ID; and that what is being signed is the string of text before the hashmark.

This could allow for something like the following:

tag:datapacrat@datapacrat.com,2013-06-05T12:00:00Z:(datapacrat@datapacrat.com),(http://twitter.com/DataPacRat),(Daniel Eliot Boese)?100&TrustCloud,774&Klout,29#randomhashofletters

… to indicate that as of that particular date, I indicate with maximum confidence that my name, email address, and Twitter account all point to me, and that I have two social media scores. (I would have used 127 decibans instead of 100 if I’d just been asserting my own identity.)

While people tend to be very bad at assigning accurate confidence levels (eg, when people claim to be 90% sure of something, they’re often wrong 50% of the time), providing at least some initial estimates of their confidence can be used as the inputs for more sophisticated Bayesian algorithms. Until such time as more accurate estimates are available, here are some possible sample confidence levels:

0 decibans: 50%: You’re not sure whether the last digit of the phone number is a 3 or a 5.
1 decibans: 55% Just slightly more likely than not; a business card handed to you by a stranger.
Up to 10 decibans: to 90%: Someone you’ve chatted to for an evening.
Up to 20 decibans: to 99%: A distant acquaintance, who you talk to once a year.
Up to 30 decibans: to 99.9%: A co-worker who might have been re-organized into a new email since you last heard from them.
Up to 40 decibans: to 99.99%: A family member, who you might accidentally have mis-spelled the email address of.
Around 100 decibans: Your own personal information, closely checked.
127 decibans: Data which relies on yourself alone, thoroughly re-checked and confirmed by others.

If there’s an image which the authority considers representative of the thing being identified (eg, a photograph or a logo), then that can be listed as well, either in the form of an URL linking to an external image, or an in-line image using the data URI ( https://en.wikipedia.org/wiki/Data_URI_scheme ).

Some of the URIs that are most likely to be useful, either for the individual identifier or as standard comment fields, are web addresses (http, https, feed, ftp, gopher), instant communications (aim, callto, gtalk, im, irc, mailto, skype, tel, ymsgr), bitcoin, physical locations (geo, maps), and calendars (webcal). From vCard are suggestions for location (adr, geo), anniversary (anniversary), birthday (bday), contact (email, impp, tel, other ims), name (fn, n, nickname), gender (gender), public encryption key (key), preferred language(s) (lang), image (logo, photo), or timezone (tz). FOAF offers images (Image, depiction, img, logo, thumbnail), online accounts (OnlineAccount, OnlineChatAccount, OnlineEcommerceAccount, OnlineGamingAccount, aimChatID, icqChatID, jabberID, msnChatID, skypeID, yahooChatID), website (homepage, page, weblog), location (based_near), name (familyName, firstName, givenName, lastName, name, nick, title), gender (gender), age (age, birthday), and miscellaneous others (myersBriggs, geekcode). I would suggest that software try to make use of any of these tags where they are used, but that identifiers not be limited to a particular list – or to having a prefix at all, especially if the tag is intended mainly to be human-readable.

Leave a Reply