Thursday, August 7, 2008

Mozilla Labs: in a land where data is free to roam...

Before you go any further, check out this new video (released by Adaptive Path), showcasing the Mozilla Labs concept browser, Aurora.

Cool, right?  A little messy, perhaps; a bit heavy on the Mac bubble-interface style; but a nifty concept.  Technologically speaking, we could build this next week...well, maybe next year, given a few hundred well-trained monkeys.

But one think keeps bugging me: though it pains me to say so, data is virtually never free.  Open up your browser and take a look at you top-ten favorites "free" sites which provide new information (be it news, humor, or simply weather data) on a daily basis.  How many of them have Google adbars? Pop-ups or flash overlays?  If they don't, do they have teaser stories which lure you into a subscription option, or are they financially backed by a hardcopy version of the paper/magazine?

The facts are simple: while the act of distributing information keeps getting cheaper (I pay less for webhosting now than I did a decade ago), the cost of acquiring new information doesn't.  Somewhere, somehow, somebody is paying for that data to be generated: CNN is using subscription or advertising revenue to fly reporters across the country, a independent blogger (ahem) is slacking off at his job to write a story, or the federal government is building a new weather station in Alaska.

Furthermore, Mozilla's vision of the future not only demands that large volumes of new and interesting data be provided freely by a variety of sources, but that these sources all use standardized, interoperable formats.  Granted, some of this interoperability will be spurred by the widespread use of open-source tools (it is a lot easier to install WordPress than to build a content management system from scratch, and with that decision comes the guarantee that your data can be more easily "scraped": extracted from your website and put into another format)...but profit-dependant organizations (PS "not-for-profit" does not mean "not profit-dependant") will always have an incentive to make it difficult to extract their data.

In the scenario Mozilla presents, data is dissociated from its source and mashed-up with other data from other sources.  In the final mashup view, there are no ads, and no highly visible indicators as to where the data came from.  This leads to two effects: providers of legitimate data start losing money, and thus eventually stop producing usable info; simultaneously, providers of untrustworthy data (spammers and other organizations with a personal agenda) gain more leverage.  Take this to its worst possible outcome, and we find ourselves in a world where most information is hearsay or opinion, facts cannot be trusted, and all the legitimate news sources are dead.

Of course, it does not have to go this way -- if we start providing the right incentives before the technology gets away from us.  Some of this is simple good practice: always provide attribution and links to your source data; fact-check before you post; pay for subscriptions to the info-sources you trust (or at least click on their ads).  But there are more complex, community wide problems to be handled too: is there an effective technological solution to spam, since the law has failed us?  How do we balance the financial incentives for data-provision against the end-users bias toward the cheapest possible source?  Should copyright exist on the Internet, or is it sufficient to cite your sources (and display their advertising stream with their data)?

Sadly, most of these questions cannot be directly resolved by us technophiles, but at the very least, lets get the discussion rolling...

No comments: