Archive for the ‘Uncategorized’ Category

Original

As part of a small project I’m working on, I need to have at least a rough description of how a given number of decibans translates into a subjective level of confidence, described in a way that can be understood by people who’ve never come across the idea before.

Some previous discussion has involved the practical maximum number of decibans, that imaginary and complex decibans aren’t relevant here, a quick reference table, and another reference table.

Here’s my first attempt an an approach: list some of the more memorable numbers of decibans, and give a rough description of that confidence level (being applied to identity verification, where possible). I’m open to any alternate approaches, and/or ways to improve this one.

 

While people tend to be very bad at assigning accurate confidence levels (eg, when people claim to be 90% sure of something, they’re often wrong 50% of the time), their initial estimates of their confidence levels can be used as the inputs for more sophisticated Bayesian algorithms. Until such time as more accurate estimates are available, here are some possible sample confidence levels:

0 decibans: 50%: You’re not sure whether the last digit of the phone number is a 3 or a 5.

1 decibans: 55% Just slightly more likely than not; a business card handed to you by a stranger.

Up to 10 decibans: to 90%: Someone you’ve chatted to for an evening.

Up to 20 decibans: to 99%: A distant acquaintance, who you talk to once a year.

Up to 30 decibans: to 99.9%: A co-worker who might have been re-organized into a new email since you last heard from them.

Up to 40 decibans: to 99.99%: A family member, who you might accidentally have mis-spelled the email address of.

Around 100 decibans: Your own personal information, closely checked. (There’s still a theoretical chance that you’re wrong, just as there’s a theoretical chance that you’re the star of something like the Truman Show.)

127 decibans: Data which relies on yourself alone, thoroughly re-checked and confirmed by others.

Watched John Hodgman’s “Ragnarok” on Netflix, after a recommendation from Kottke. Liked a song I heard. Found only one copy online: http://blip.tv/play/hpcYgf21BAA%2Em4v

Video by David Buckland

Lyrics:
Things fall apart, everything tends to decay
And so it takes a lot to combine atoms in such a way
That they resist the lure of that darkness
that lurks around the edges of every day

So I’m inviting you to join me in this fight
to go down to the river, and come up all three times
Hank Williams was right: no one gets out alive
All we can do is try to have a really good time

Resist the tide, stand in the water
That’s baptism, that’s making light
Electricity is proof that there can be
A little bit of light in all this darkness

So please don’t go so gently into that good night
Rage, rage, rage against the dying of the light
You know you got a voice, to talk with you can call your own!
So clear your throat, and start singing this song:

Resist the tide, stand in the water
That’s baptism, that’s making light
Electricity is proof that there can be light
It takes a lot of work, but oh baby, it’s worth it.

Whether or not you have a ‘business’ for which a card is required, there is still good value in having a social card, sometimes called a networking card – or, if you prefer to go old-school, a calling card.

(I’ve blurred out the details I don’t feel like spreading /too/ wantonly: my phone number, and the QR code which contains my name, email address, phone number, and a link to an online vCard file which contains the most recently updated versions of all of the above.)

I know of few-to-none business cards which contain cuneiform. I know of even fewer which have it as an accurate translation of information on the card, rather than as merely a background decorative element. (Ditto Mayan hieroglyphs, Egyptian hieroglyphs, tengwar, Ogham, Morse code of a ham radio callsign, international phonetic alphabet, etc.)

card back websafe

card front websafe

Original

It’s well-established that 0 decibans means 1:1 odds or 50% confidence; that 10 decibans means 10:1 odds; that -10 decibans means 1:10 odds; and that fractional numbers of decibans have similar meaning.

Does it make sense to talk about “i decibans”, or “10 + 20i decibans”? If so, what does that actually mean?

I’m currently roughing out what may eventually become a formal specification for a protocol. It includes a numerical field for a level of confidence, measured in decibans. I’d like to know if I should simply define the spec as only allowing real numbers, or if there could be some purpose in allowing for complex numbers, as well.

Recursion and reflexivity: If nym ever does become a formal IETF spec, then it would be a full-fledged URI; which would mean that the Identity fields in a nym could themselves be nyms. I don’t see any reason to rule this out, as long as it’s made clear that if a nym is used as an identifier, the identifier is referring to the act of assertion made by the nym rather than whatever that nym is itself referring to.

Time periods: ISO 8601 and RFC 3339 allow for time-strings that don’t just refer to a moment, but to an extended period of time. This seems to be quite useful, such as using the string “2000/P1Y” to refer to the whole of that year. And since the current draft for nym’s format doesn’t use the “/” character, no extra ambiguities seem to be introduced by allowing such date-fields.

Next up: Looking up whether there are any ISO or RFC standards on numbers and mathematical notation; and checking on whether it’s meaningful to measure decibans with complex numbers, or if nym should limit the confidence field to real numbers.

For my own purposes, I’m putting together a new pseudo-URI, loosely based on ‘tag:’ ( http://www.taguri.org/ , https://tools.ietf.org/html/rfc4151 ), to use for peer-to-peer distributed reputation systems. What I am aiming for is a common protocol that can easily express this idea: “Authority A asserts that X and Y refer to the same entity” (with a certain amount of certainty) (with an optional comment field) (with an optional authentication hash). Depending on how well it works, I may even look into the Internet Draft submission process to put it on the track to become official.

Early draft for ‘nym’ URI

In math, ‘0.999…’ and ‘1’ are different representations of the same underlying concept. Multiple social media profiles and contact methods can represent the same underlying person. Books can be referred to by their author and title, or their ISBN. The ‘nym’ URI announces that a given authority asserts that two or more representations both refer, at least in a general sense, to the same thing; that is, they describe the same entity in different formats.

Preliminary formatting structure idea:

nym:Authority[,date]:(Identity1)[,date];(Identity2)[,date][;(Identity3)[,date]][?comment1[&comment2][#authenticationHash]

The ‘date’ fields can be any valid ISO 8601 date or time-and-date. If present, they should contain at least a four-digit year. The date for the authority field may indicate any time when the authority field referred to the authority making the assertion, in the same fashion as the “tag:” URI. The date for the authority field should refer to a date reasonably closely correlated with when the authority is making the assertion. The dates for the identity fields, if present, should refer to a point in time when that identity is connected to the underlying entity.

The Authority and Identity fields can be any relevant string. In descending order of preference, these should be:
* Well-formed URIs (eg, “http://www.example.com/SocialMediaProfile”)
* Email addresses lacking the “mailto:” header (which should be assumed to be identical to a field containing that header)
* Domain names
* Valid vCard property types (such as “key:” to indicate a public encryption key)
* Valid FOAF property labels (such as “openid:”)
* Some other field of the form “generalLabel:particularEntity” (eg, “LibraryCard:23043001054082”)
* Any other string

The Authority and Identity fields may have characters escaped. If they contain characters which would allow for misinterpretation of the overall nym statement, they must be escaped. The fields may be enclosed in quotation marks.

The comment fields may contain additional information, which is peripheral to the relationship being asserted between the Identities. Possible uses may include trustcloud whuffie scores, or how the authority knows the individual being identified.

If a comment field is a number, that number is assumed to be how confident the authority is that all the listed identifiers all refer to the same entity, measured in decibans. (Decibans are logarithmic, with 0 decibans being equivalent to 1:1 odds, or being 50% confident; 10 decibans to 10:1 odds, or ~90% confident; 20 decibans to 100:1 odds, or ~99% confident; and so on.) It is recommended that these numbers be integers, unless there is a specific reason to be able to specify confidence to greater accuracy; and with a magnitude under 128, as it requires extraordinary effort to have 100 decibans of confidence for even the most fundamental facts. If no specific confidence field is entered, the confidence value of the overall nym statement should only be assumed to be ‘greater than zero’.

While people tend to be very bad at assigning accurate confidence levels (eg, when people claim to be 90% sure of something, they’re often wrong 50% of the time), their initial estimates of their confidence levels can be used as the inputs for more sophisticated Bayesian algorithms. Until such time as more accurate estimates are available, here are some possible sample confidence levels:
0 decibans: 50%: You’re not sure whether the last digit of the phone number is a 3 or a 5.
1 decibans: 55% Just slightly more likely than not; a business card handed to you by a stranger.
Up to 10 decibans: to 90%: Someone you’ve chatted to for an evening.
Up to 20 decibans: to 99%: A distant acquaintance, who you talk to once a year.
Up to 30 decibans: to 99.9%: A co-worker who might have been re-organized into a new email since you last heard from them.
Up to 40 decibans: to 99.99%: A family member, who you might accidentally have mis-spelled the email address of.
Around 100 decibans: Your own personal information, closely checked. (There’s still a theoretical chance that you’re wrong, just as there’s a theoretical chance that you’re the star of something like the Truman Show.)
127 decibans: Data which relies on yourself alone, thoroughly re-checked and confirmed by others.

The authentication hash is to provide strong evidence that the listed authority is actually the one making the assertion. By default, it is assumed to be based on whatever public cryptographic key (eg, PGP/GnuPG or X.509) is linked to the listed authority ID; and that what is being signed is the string of text before the hashmark.

Some examples:

nym:datapacrat.com:(datapacrat@datapacrat.com),2013-06-05;(http://twitter.com/DataPacRat),2013-06-05;(Daniel Eliot Boese)?100&TrustCloud,774&Klout,29#randomhashofletters

… to indicate that as of that particular date, I indicate with extremely high confidence that my name, email address, and Twitter account all point to me, and that I have two social media scores.

nym:example.com:(example.com);(KEY;PGP:http://example.com/key.pgp)

Example.com asserts that its public key can be found at a particular URL.

nym:example.com:(ID1),2000;(ID1),2001

Example.com asserts that that the same identity referred to the same individual on two different dates. Unless some other nym statement is made, it may be assumed that what is being asserted is that the identity referred to the same individual during the entire period between those dates.

nym:example.com:(ID1),2000-12-31;(ID1),2001-01-01?-100

Example.com asserts with strong confidence that ID1 referred to an entity on one day, but did not refer to it on another day. This can be used to revoke an identity, such as if example.com shut down a social media account. (Note that a nym statement with a positive confidence level asserts that /all/ the identities refer to the same entity; while a nym statement with a negative confidence level assrts that /at least one/ of the identities does not refer to the same entity as the others. Thus, in order to make what identity is being revoked clear, the revocation statement should only contain two identity fields.)

I’ve been in touch with the authours of RFC 4151, and they don’t endorse complicating up that simple protocol with my ideas for an authentication URI. I can fully understand that, and don’t disagree.

Thus, if the idea does get off the ground, it will nigh-certainly be under a different name than ‘tag:’. The first few possibilities that came to my mind were ‘nym:’, ‘dub:’, ‘peg:’, and ‘id:’.

Relatedly, they pointed out that the dating could be improved. So I’m most likely going to have the new URI, whatever its name is, allow for not just a date for the authority, but also for each identifier being described, so that the system can express the idea “The person who had email X on 2013-06-05 is the person who had email Y on 2013-06-06”. (If done right, this could also allow for the expression of “The person who had email X on 2000-01-01 is the person who still/also has email X on 2012-12-21”.)

I’m currently looking into adapting the ‘tag’ URI ( http://www.taguri.org/ , https://tools.ietf.org/html/rfc4151 ) to use for distributed reputation systems. I’m hoping to end up with a common protocol which can easily express: “Authority A says that X and Y are the same person” (with an optional field for the certainty of that statement) (with an optional comment field) (with an optional authentication hash). The existing tag URI does a fine job allowing for expressing what A and X are, and seems the ideal base for the remainder. It also seems extremely simple to be able to implement the upgraded ‘tag’ in existing mail/contact software, so that tag: links can be automatically imported into contact lists to update them.

Here’s an initial draft of what I mean:

tag:Authority,DateTime:(Identity1),(Identity2)[,(Identity3)[,(Identity4)]][?comment1[&comment2[&comment3]]][#authenticationHash]

This includes some changes from current format of ‘tag’:

The authority can be not only an email address or a domain name, but some other identifier, such as the URL of a social media profile, such as http://twitter.com/DataPacRat or http://www.facebook.com/DataPacRat . (Worth discussing: whether anything more complicated than an email address should be enclosed in quotes, as URLs are in the ‘a href’ tag in HTML.) (Part of the reason for this is to allow chains of authorities – so that http://twitter.com/DataPacRat can be used to authenticate http://twitter.com/Example , with trust-measurements for each step.)

The date-stamp is no longer just a date when the authority-identifier is under control of the authority, but is also the moment when the authority makes its assertion about the identities.

The date is no longer limited just to a particular date, but can be specified (using ISO 8601 format) down to a particular second, for any rapidly-changing identities.

The identities can be plaintext names, email addresses, domains, social media profile URLs, library card numbers, or any other chosen identifiers.

The comment fields are left incompletely defined, to allow for future expansion; such as to indiciate trustcloud whuffie scores, how the authority knows the individual being identified, or anything else that someone comes up with.

By default, if the comment field is a number, that number is assumed to be how confident the authority is that the listed identifiers all refer to the same individual, measured in decibans. (Decibans are logarithmic, with 0 decibans being equivalent to 1:1 odds, or being 50% confident; 10 decibans to 10:1 odds, or ~90% confident; 20 decibans to 100:1 odds, or ~99% confident; and so on.) It is recommended that these numbers be integers, unless there is a specific reason to be able to specify confidence to greater accuracy; and with a magnitude under 128, as it requires extraordinary effort to have 100 decibans of confidence for even the most fundamental facts.

(This method also implicitly allows for revocation of an identity referral, simply by using a negative number instead of a positive one.)

The authentication hash is to provide strong evidence that the listed authority is actually the one making the assertion. By default, it is assumed to be based on whatever public cryptographic key (eg, PGP/GnuPG or X.509) is linked to the listed authority ID; and that what is being signed is the string of text before the hashmark.

This could allow for something like the following:

tag:datapacrat@datapacrat.com,2013-06-05T12:00:00Z:(datapacrat@datapacrat.com),(http://twitter.com/DataPacRat),(Daniel Eliot Boese)?100&TrustCloud,774&Klout,29#randomhashofletters

… to indicate that as of that particular date, I indicate with maximum confidence that my name, email address, and Twitter account all point to me, and that I have two social media scores. (I would have used 127 decibans instead of 100 if I’d just been asserting my own identity.)

While people tend to be very bad at assigning accurate confidence levels (eg, when people claim to be 90% sure of something, they’re often wrong 50% of the time), providing at least some initial estimates of their confidence can be used as the inputs for more sophisticated Bayesian algorithms. Until such time as more accurate estimates are available, here are some possible sample confidence levels:

0 decibans: 50%: You’re not sure whether the last digit of the phone number is a 3 or a 5.
1 decibans: 55% Just slightly more likely than not; a business card handed to you by a stranger.
Up to 10 decibans: to 90%: Someone you’ve chatted to for an evening.
Up to 20 decibans: to 99%: A distant acquaintance, who you talk to once a year.
Up to 30 decibans: to 99.9%: A co-worker who might have been re-organized into a new email since you last heard from them.
Up to 40 decibans: to 99.99%: A family member, who you might accidentally have mis-spelled the email address of.
Around 100 decibans: Your own personal information, closely checked.
127 decibans: Data which relies on yourself alone, thoroughly re-checked and confirmed by others.

If there’s an image which the authority considers representative of the thing being identified (eg, a photograph or a logo), then that can be listed as well, either in the form of an URL linking to an external image, or an in-line image using the data URI ( https://en.wikipedia.org/wiki/Data_URI_scheme ).

Some of the URIs that are most likely to be useful, either for the individual identifier or as standard comment fields, are web addresses (http, https, feed, ftp, gopher), instant communications (aim, callto, gtalk, im, irc, mailto, skype, tel, ymsgr), bitcoin, physical locations (geo, maps), and calendars (webcal). From vCard are suggestions for location (adr, geo), anniversary (anniversary), birthday (bday), contact (email, impp, tel, other ims), name (fn, n, nickname), gender (gender), public encryption key (key), preferred language(s) (lang), image (logo, photo), or timezone (tz). FOAF offers images (Image, depiction, img, logo, thumbnail), online accounts (OnlineAccount, OnlineChatAccount, OnlineEcommerceAccount, OnlineGamingAccount, aimChatID, icqChatID, jabberID, msnChatID, skypeID, yahooChatID), website (homepage, page, weblog), location (based_near), name (familyName, firstName, givenName, lastName, name, nick, title), gender (gender), age (age, birthday), and miscellaneous others (myersBriggs, geekcode). I would suggest that software try to make use of any of these tags where they are used, but that identifiers not be limited to a particular list – or to having a prefix at all, especially if the tag is intended mainly to be human-readable.

Most people would say ‘Gimme a minute’, ‘gimme a sec’, or ‘gimme a moment’ mean just about the same thing – and that of the three, a ‘moment’ is the shortest – and that a ‘moment’ is actually not a really well defined length of time.

The truth is the opposite: a moment has an exact value, and that value is 90 seconds. (Source)

And there are odder units than that; an ounce is 1/12 of a moment, or 7.5 seconds… and an ounce can be divided into exactly 47 atoms of about 160 milliseconds each.

Makes me start to wonder if we really should throw the Imperial system completely out, and move to some form of metric time, or something based on the Planck units. In the meantime, there’s always the Tranquility Calendar

Modern browsers have started to hide them, but an important part of an internet address are those few letters that come right at the beginning – usually ‘http’ or ‘https’, but also including ‘ftp’, or for old-style internetters, ‘gopher’. A fun one is ‘tel’, which lets you encode telephone numbers in the internet’s formalized way; or ‘geo’, for physical locations.

One that has received astonishingly little attention is ‘tag‘. Here’s an example:

tag:datapacrat.com,0000:DataPacRat

It has two main parts – the naming authority, and the name. (The meaning of ‘tag:’ should be obvious.)

The first part of the naming authority is supposed to be either a specific email address, or a full domain. Since I run datapacrat.com on my own, I might as well keep things short, and just use that.

The second part of the authority is a date. The intention here is that email addresses and domains change hands over time, so it’s useful to indicate who owned them at the time by indicating when they were owned. I picked up datapacrat.com shortly before August 8, 2008, so it would be easy enough for me to use 2008-08-08 as the date field. However, the ‘tag’ description also says that the date field can indicate a date when /nobody/ owned the email/domain in question, as long as there’s reasonable proof that nobody /else/ owned it between the indicated date and when the naming authority got control over it. Since I’m the very first person to own datapacrat.com, I could pick any date before the present, and as long as it conformed to the internet specs for the date format, it would be valid. Speaking of date specs, it’s possible to shorten the date to just a month, or just a year; and the tag’s official format requires a ‘four digit’ year, which the date spec describes for any year between 0000 and 9999. So, once again, to keep things short and sweet and easily memorable, I’m using the year 0000.

After that comes… anything the naming authority wants. It’s a full namespace, open for whatever purposes are desired. It could refer to people, places, concepts, data, anything. In this particular example, I hereby define DataPacRat as referring to one thing in particular: myself.

To show some of the versatility of this, let’s try an extreme and possibly absurd example. Let’s assume that at some point in the future, my mind gets uploaded into the form of a digital computer program; and copies start being made of it. If ‘DataPacRat’ refers to me, and all of the copies are me, does that mean that the single ‘tag’ URI refers to all of them? It can – unless I set things up ahead of time to be able to handle the situation. As it happens, I have a naming schema already worked out for just such an occasion.

Before any copies are made, DataPacRat refers to the single instance of me. At the moment of copying, DataPacRat.0 refers to the non-running version, the inactive data which can have copies made. The first such copy is DataPacRat.1, the second DataPacRat.2, and so on. Should DataPacRat.2 be copied, then we get DataPacRat.2.0, DataPacRat.2.1, etc.

If need be, various further details can be established to identify more exactly which copies get which names, if non-copying edits are made, if data about when copies are made is lost, how to compress unfeasibly long names, and so forth – but that’s just one, simple example of the power of having your own namespace.

And for now, this is DataPacRat, hoping to one day become DataPacRat.0, signing off.