Archive for June, 2013

I’m currently looking into adapting the ‘tag’ URI ( http://www.taguri.org/ , https://tools.ietf.org/html/rfc4151 ) to use for distributed reputation systems. I’m hoping to end up with a common protocol which can easily express: “Authority A says that X and Y are the same person” (with an optional field for the certainty of that statement) (with an optional comment field) (with an optional authentication hash). The existing tag URI does a fine job allowing for expressing what A and X are, and seems the ideal base for the remainder. It also seems extremely simple to be able to implement the upgraded ‘tag’ in existing mail/contact software, so that tag: links can be automatically imported into contact lists to update them.

Here’s an initial draft of what I mean:

tag:Authority,DateTime:(Identity1),(Identity2)[,(Identity3)[,(Identity4)]][?comment1[&comment2[&comment3]]][#authenticationHash]

This includes some changes from current format of ‘tag’:

The authority can be not only an email address or a domain name, but some other identifier, such as the URL of a social media profile, such as http://twitter.com/DataPacRat or http://www.facebook.com/DataPacRat . (Worth discussing: whether anything more complicated than an email address should be enclosed in quotes, as URLs are in the ‘a href’ tag in HTML.) (Part of the reason for this is to allow chains of authorities – so that http://twitter.com/DataPacRat can be used to authenticate http://twitter.com/Example , with trust-measurements for each step.)

The date-stamp is no longer just a date when the authority-identifier is under control of the authority, but is also the moment when the authority makes its assertion about the identities.

The date is no longer limited just to a particular date, but can be specified (using ISO 8601 format) down to a particular second, for any rapidly-changing identities.

The identities can be plaintext names, email addresses, domains, social media profile URLs, library card numbers, or any other chosen identifiers.

The comment fields are left incompletely defined, to allow for future expansion; such as to indiciate trustcloud whuffie scores, how the authority knows the individual being identified, or anything else that someone comes up with.

By default, if the comment field is a number, that number is assumed to be how confident the authority is that the listed identifiers all refer to the same individual, measured in decibans. (Decibans are logarithmic, with 0 decibans being equivalent to 1:1 odds, or being 50% confident; 10 decibans to 10:1 odds, or ~90% confident; 20 decibans to 100:1 odds, or ~99% confident; and so on.) It is recommended that these numbers be integers, unless there is a specific reason to be able to specify confidence to greater accuracy; and with a magnitude under 128, as it requires extraordinary effort to have 100 decibans of confidence for even the most fundamental facts.

(This method also implicitly allows for revocation of an identity referral, simply by using a negative number instead of a positive one.)

The authentication hash is to provide strong evidence that the listed authority is actually the one making the assertion. By default, it is assumed to be based on whatever public cryptographic key (eg, PGP/GnuPG or X.509) is linked to the listed authority ID; and that what is being signed is the string of text before the hashmark.

This could allow for something like the following:

tag:datapacrat@datapacrat.com,2013-06-05T12:00:00Z:(datapacrat@datapacrat.com),(http://twitter.com/DataPacRat),(Daniel Eliot Boese)?100&TrustCloud,774&Klout,29#randomhashofletters

… to indicate that as of that particular date, I indicate with maximum confidence that my name, email address, and Twitter account all point to me, and that I have two social media scores. (I would have used 127 decibans instead of 100 if I’d just been asserting my own identity.)

While people tend to be very bad at assigning accurate confidence levels (eg, when people claim to be 90% sure of something, they’re often wrong 50% of the time), providing at least some initial estimates of their confidence can be used as the inputs for more sophisticated Bayesian algorithms. Until such time as more accurate estimates are available, here are some possible sample confidence levels:

0 decibans: 50%: You’re not sure whether the last digit of the phone number is a 3 or a 5.
1 decibans: 55% Just slightly more likely than not; a business card handed to you by a stranger.
Up to 10 decibans: to 90%: Someone you’ve chatted to for an evening.
Up to 20 decibans: to 99%: A distant acquaintance, who you talk to once a year.
Up to 30 decibans: to 99.9%: A co-worker who might have been re-organized into a new email since you last heard from them.
Up to 40 decibans: to 99.99%: A family member, who you might accidentally have mis-spelled the email address of.
Around 100 decibans: Your own personal information, closely checked.
127 decibans: Data which relies on yourself alone, thoroughly re-checked and confirmed by others.

If there’s an image which the authority considers representative of the thing being identified (eg, a photograph or a logo), then that can be listed as well, either in the form of an URL linking to an external image, or an in-line image using the data URI ( https://en.wikipedia.org/wiki/Data_URI_scheme ).

Some of the URIs that are most likely to be useful, either for the individual identifier or as standard comment fields, are web addresses (http, https, feed, ftp, gopher), instant communications (aim, callto, gtalk, im, irc, mailto, skype, tel, ymsgr), bitcoin, physical locations (geo, maps), and calendars (webcal). From vCard are suggestions for location (adr, geo), anniversary (anniversary), birthday (bday), contact (email, impp, tel, other ims), name (fn, n, nickname), gender (gender), public encryption key (key), preferred language(s) (lang), image (logo, photo), or timezone (tz). FOAF offers images (Image, depiction, img, logo, thumbnail), online accounts (OnlineAccount, OnlineChatAccount, OnlineEcommerceAccount, OnlineGamingAccount, aimChatID, icqChatID, jabberID, msnChatID, skypeID, yahooChatID), website (homepage, page, weblog), location (based_near), name (familyName, firstName, givenName, lastName, name, nick, title), gender (gender), age (age, birthday), and miscellaneous others (myersBriggs, geekcode). I would suggest that software try to make use of any of these tags where they are used, but that identifiers not be limited to a particular list – or to having a prefix at all, especially if the tag is intended mainly to be human-readable.

Most people would say ‘Gimme a minute’, ‘gimme a sec’, or ‘gimme a moment’ mean just about the same thing – and that of the three, a ‘moment’ is the shortest – and that a ‘moment’ is actually not a really well defined length of time.

The truth is the opposite: a moment has an exact value, and that value is 90 seconds. (Source)

And there are odder units than that; an ounce is 1/12 of a moment, or 7.5 seconds… and an ounce can be divided into exactly 47 atoms of about 160 milliseconds each.

Makes me start to wonder if we really should throw the Imperial system completely out, and move to some form of metric time, or something based on the Planck units. In the meantime, there’s always the Tranquility Calendar

Modern browsers have started to hide them, but an important part of an internet address are those few letters that come right at the beginning – usually ‘http’ or ‘https’, but also including ‘ftp’, or for old-style internetters, ‘gopher’. A fun one is ‘tel’, which lets you encode telephone numbers in the internet’s formalized way; or ‘geo’, for physical locations.

One that has received astonishingly little attention is ‘tag‘. Here’s an example:

tag:datapacrat.com,0000:DataPacRat

It has two main parts – the naming authority, and the name. (The meaning of ‘tag:’ should be obvious.)

The first part of the naming authority is supposed to be either a specific email address, or a full domain. Since I run datapacrat.com on my own, I might as well keep things short, and just use that.

The second part of the authority is a date. The intention here is that email addresses and domains change hands over time, so it’s useful to indicate who owned them at the time by indicating when they were owned. I picked up datapacrat.com shortly before August 8, 2008, so it would be easy enough for me to use 2008-08-08 as the date field. However, the ‘tag’ description also says that the date field can indicate a date when /nobody/ owned the email/domain in question, as long as there’s reasonable proof that nobody /else/ owned it between the indicated date and when the naming authority got control over it. Since I’m the very first person to own datapacrat.com, I could pick any date before the present, and as long as it conformed to the internet specs for the date format, it would be valid. Speaking of date specs, it’s possible to shorten the date to just a month, or just a year; and the tag’s official format requires a ‘four digit’ year, which the date spec describes for any year between 0000 and 9999. So, once again, to keep things short and sweet and easily memorable, I’m using the year 0000.

After that comes… anything the naming authority wants. It’s a full namespace, open for whatever purposes are desired. It could refer to people, places, concepts, data, anything. In this particular example, I hereby define DataPacRat as referring to one thing in particular: myself.

To show some of the versatility of this, let’s try an extreme and possibly absurd example. Let’s assume that at some point in the future, my mind gets uploaded into the form of a digital computer program; and copies start being made of it. If ‘DataPacRat’ refers to me, and all of the copies are me, does that mean that the single ‘tag’ URI refers to all of them? It can – unless I set things up ahead of time to be able to handle the situation. As it happens, I have a naming schema already worked out for just such an occasion.

Before any copies are made, DataPacRat refers to the single instance of me. At the moment of copying, DataPacRat.0 refers to the non-running version, the inactive data which can have copies made. The first such copy is DataPacRat.1, the second DataPacRat.2, and so on. Should DataPacRat.2 be copied, then we get DataPacRat.2.0, DataPacRat.2.1, etc.

If need be, various further details can be established to identify more exactly which copies get which names, if non-copying edits are made, if data about when copies are made is lost, how to compress unfeasibly long names, and so forth – but that’s just one, simple example of the power of having your own namespace.

And for now, this is DataPacRat, hoping to one day become DataPacRat.0, signing off.

Once you understand a format well enough, handcrafting files within that format can often let you do things beyond what the people who wrote the format ever expected.

For example, here is a small ZIP file, containing no executable code, no viruses, no dangerous data of any sort. It’s a mere 42 kilobytes – smaller than my avatar image. The tricky bit is that, when fully uncompressed, the result is 4.5 petabytes of data. That’s right – not mega-, giga-, or even tera-bytes, but peta-bytes. (Source)

For a more extreme example, here is a teensy little zip file, a mere 440 bytes. When that file is uncompressed, the result is, within a new subdirectory, a teensy little zip file, a mere 440 bytes. When that file is uncompressed, the result is, within a new subdirectory, a teensy little zip file, a mere 440 bytes. When that file is uncompressed, the result is, within a new subdirectory, a teensy little zip file, a mere 440 bytes. … And so on. That’s right – attempting to fully uncompress all layers of this little gem results in an /infinite/ series of files. (Source)

Be quite cautious if you actually download these files. Some versions of virus scanner programs attempt to decompress archived programs completely in order to scan them; at least one server version of McAfee virus is known to be vulnerable to this, and will fill up its disc, crashing its system. Imagine that: one of the most popular, well-known, and carefully crafted pieces of software in the world… taken down by 440 bytes that fall entirely within the legal specs of the ZIP format.