Most recent

Been a while

I've been working on something over at the dayjob. (Although I'm writing this at 2:36 AM from the office, so not just dayjob.) I tell you because it's fun, and it's free to use.

I went out the other day with some XML folks, old hands. We talked about ISO 8879, which I once photocopied in its entirety, and old issues of Creative Computing, and about Ted Nelson. I said, now that I have gained experience in key web technologies Django and SOLR, I feel I have the experimental platform I need to implement a new version of Ftrain with a new kind of story, entitled “Lost Dogs, or, the Unhappy Town.” The person I told this to, you could tell he was not buying this. He said, “I am not buying this.” There was a definite sense of trains leaving stations, boats leaving docks, bicycles unracking, respectively blowing whistles or tooting horns or tinkling bells. A part of me turned into birds and fluttered away, a flock heading to sea. They were dragging a whale. I thought, well, shit, I guess I better think about that...


[Top]

Learning to Fear the Semantic Web

Zotero is an open-sourced bibliography-management tool that runs inside Firefox-based browsers (see screencast). It helps you keep track of your research. I've enjoyed using it as I work on writing projects. From the about page:

Zotero is a production of the Center for History and New Media at George Mason University. It is generously funded by the United States Institute of Museum and Library Services, the Andrew W. Mellon Foundation, and the Alfred P. Sloan Foundation.

Nice! Except today, a good bit after the fact, I learned of a peculiar lawsuit that information and news giant Thomson Reuters Inc. filed last month against the makers of Zotero. From the website of The Chronicle of Higher Education, October 3, 2008, by Jeffrey R. Young (links added):

Thomson Reuters Inc. sued George Mason University in a Virginia court this month, arguing that a free software tool made by the university makes improper use of the company’s EndNote citation software....

Thomson Reuters argues that the latest release of George Mason’s software, which can import files created by EndNote and turn them into files that can be used and shared online using Zotero, “is willfully and intentionally destroying Thomson’s customer base for the EndNote software.” The company seeks $10-million in damages for each year the university has offered the software and to stop the university from distributing versions of Zotero that can convert EndNote files.

One person who commented on the lawsuit is Michael Feldstein, who writes a blog about online learning. He posted the following on October 5:

Apparently, the Zotero team did create their own style format and is crowd-sourcing the creation of import styles. As you can see from this Zotero developer discussion thread, the developers considered and explicitly rejected supporting the redistribution of Thomson-supplied EndNote conversion files. In fact, while Zotero can read EndNote style files, it specifically does not convert them into Zotero’s own format, in large part to discourage the redistribution (deliberately or accidentally) of Thomson-created files. What the import feature does facilitate is (a) users who have already licensed EndNote and want to migrate to Zotero can use the EndNote styles that they have already paid for, and (b) Zotero users can take advantage of the EndNote import styles that individual journal publishers (as opposed to Thomson itself) make available for the convenience of their subscribers. These uses strike me as totally within bounds.

(More is available from the Disruptive Library Technology Jester blog.)

Given my biases this lawsuit seems like an anachronistic, hamfisted attempt to block competition. While as a programmer I love being able to adapt open-source software to my particular needs, I use a mix of closed-source and open-source software without many qualms. That said, non-standard, closed-source document formats are awful stuff that block competition between software vendors and, worse, waste god-awful amounts of my time. If you wish to dispute me on this then come to my office tomorrow to help me, over the course of several hours, yank a magazine's-worth of text out of Quark XPress, using a mix of applications and balky emacs macros. (Imagine if you could take back all the time spent wrangling closed, proprietary document formats. You could finish Perl 6; you could probably write it in Arc.)

I'm not an Endnote user and I don't like to borrow trouble (which is why I've been avoiding this blog; blogging is a great way to borrow trouble). But not only does this lawsuit invoke the dread specter of legally-enforced proprietary data formats, it raises questions about Thomson Reuters's legal attitude towards the data produced by its other software offerings—including, in this case, a piece of software called OpenCalais.

OpenCalais is a web-based application that consumes text and returns special Semantic Web-style metadata that you can use to do interesting, Semantic Web-style things, like: create topic pages, improve search, or enhance local taxonomies. It has a Facebook group and its website features both video of straight-talking bearded coders and a creatively borrowed terms of service statement:

We based these Terms of Service under those released by Automattic under a Creative Commons Sharealike license. Thanks to Automattic and WordPress.com for sharing.

I have a quarter-million-page corpus at work and I'm looking for simple, inexpensive ways to enhance it, so I've followed the development of their platform for some time—joining the FaceBook group, signing up for an account, and using their free endpoint for testing (go ahead and give it a spin). My grand, entirely unrealized plan was to include a direct hook to OpenCalais in our content management system. The OpenCalais team seem trustworthy, progressive, and smart, and committed to openness. But, at least for now, the lawsuit against Zotero has scared me off using the product.

This despite, as pointed out by the Panlibus blog at Talis, in a post on OpenCalais as it relates to the Zotero lawsuit, the following statement from the OpenCalais folk:

We want to make all the world’s content more accessible, interoperable and valuable. Some call it Web 2.0, Web 3.0, the Semantic Web or the Giant Global Graph—we call our piece of it Calais.

So why am I overreacting? Well, that “our piece of it” bit is a little tricky, but I think I get what they mean, and the Endnote people and the OpenCalais people are in different parts of a very large organization and working on different projects with different goals. But the parent company is the same, and, professionally I feel required to overreact, because in every situation—as editor, coder, designer, and so forth—I to my great regret must always concern myself with liability.

I hate that part of my job. From worrying about copyright and fair use, to questioning whether we can reuse art or prose from our own archives, to sending out cease and desists—it all fills me with gloom and despair, the sense of being a culpable cog in a lumbering legal machine. It's the opposite of creative, interesting work, but if you get something wrong the consequences can be dire, so worrying about getting sued is something that has to be done, every day, even on the subway. I'm worried about getting sued right now, sitting here, typing this. If you've had someone threaten you with a lawsuit, you know the sort of fear and second-guessing it engenders. Even if I am certain that I have followed every ethical and legal guideline, it's an instant panic attack to see the words “contacting a lawyer” or “liable for damages” in an email; it leads to second-guessing, and I know there will be phone calls, meetings, and several months of followups to comply with the needs of insurers. If I can see the shadow of a lawsuit anywhere I am obligated to shine a light upon it and freak out at least a little; otherwise I'm not doing my job.

And that's what's going on here. This recent lawsuit against George Mason/Zotero immediately brought to to mind a scenario: Thomson Reuters maintains control over the taxonomy, the thesaurus, of terms used in OpenCalais, and they do the indexing of content to associate that content with terms. The use pattern I was considering was as follows:

  1. Create text within a content management system;
  2. Send that text to OpenCalais;
  3. Store the metadata it returns;
  4. Over time, use aggregated metadata, integrated with our existing ~80,000 subjects, to create a local taxonomy for faceted search and automatically-compiled topic pages, along with other interesting interfaces.
  5. Share as much of the taxonomy as possible as downloadable RDF;
  6. Make sure to provide links back to OpenCalais wherever possible, on their terms, as defined in their Terms of Service (TOS) document.

That's probably not a big deal. I doubt anyone would even notice. But... is it at all possible, conceivable, even a tiny bit that at some point in the future Thomson Reuters could claim that we were misusing their data in step (4), above? From the TOS:

If you syndicate, publish or otherwise transmit any content containing, enhanced by or derived from Calais-generated metadata you will use your best efforts to incorporate the correct Calais-provided Globally Unique Identifier [GUID] in that content.

It seems straightforward, but that “best efforts....” The truth is, I don't really know exactly what they mean there. Also from the TOS:

You will not use any metadata or GUIDs produced by Calais to create a metadata retrieval service similar to Calais.

And could they claim that we were somehow creating a derivative work without permission and distributing it in step (5)?

I would say, based on my far-from-authoritative reading of the TOS, and given the suit against George Mason University, there is now a precedent; that is, it is within the realm of possibility that if I passed thousands of web pages through OpenCalais and decided to adapt the resultant format for my own use in a way that Thomson Reuters disliked, I could get a fat letter from some lawyer someday demanding damages, accusing me of creating a derivative work based on their proprietary taxonomy, in violation of their terms.

I'm not saying it's likely; I'm not saying I'm right; I'm not even saying that Thomson Reuters would be legally or ethically wrong to sue for damages. I would bet $10,000 right now against my fears coming to pass. But IANAL, which is exactly my problem here. And this is not a call to boycott anything, nor an attempt to get personalized service out of OpenCalais, where the developers are doing some very fine Semantic Web-bootstrapping work. I know Thomson Reuters could give a damn about me, and in that they are justified—I'm just another API key hash in their database, and even if I upgraded to their for-pay service I'd never represent more than a balance-sheet rounding error.

My only purpose in writing today is to point out how a lawsuit can have unintended chilling effects, at least for me. We're in a remarkable downturn, and people are being told to “get real or go home.” One way corporations get “real” is to sue the living shit out of everything that blinks. It's probably a good time to review the terms of service for all of your critical software to make sure you're in compliance; I wonder if a lot of Web 2.0 mashup decentralized goodwill is going to go to good-faith heaven as companies under financial strain start to look closely at their patent portfolios and vendor agreements, and decide that printing out lawsuits is even cheaper than deploying to EC2. And now that the “Semantic Web,” or “Web 3.0,” or the “Linked Data Web,” or the “Web of Really, That's How to Query Over an rdf:Bag?” or whatever they're calling it, is viable enough that you can't shrug off legal worries—now that the Semantic Web is no longer just a research project, if someone owns the taxonomy you're using and changes it up on you, what rights do you have in the matter? Who owns the GUIDs? Your honor, I just wanted to build a hierarchy of topic pages. I never meant to hurt nobody. And so forth.

To summarize: working in web publishing, I have a healthy fear of lawsuits bordering on the insanely paranoid; and I wish it were not so, but that is now part of the job, as the web of ideas has given way to the web of pricks; and finally, actions speak louder than Creative Commons-licensed terms of service. You can still get handed a subpoena while you're riding the Cluetrain.

Now that I got the fear, do I want to go to the effort to (1) educate a few people in management, none of whom would have great interest in the subject except as a soporific, about the far-fetched risks of using externally-generated taxonomies to organize our content; and do I (2) want to spend a number of hours in the near future educating myself over the completely nebulous rights issues connected to taxonomies, linking, and file formats, thus taking even more time away from code and prose to give it to the law; and do I possibly even (3) want to allocate the budget to work with a lawyer on taxonomy-related issues? All the while knowing that I'm overreacting and that this is probably pointless?

Not really. I'd rather let other people do that and read the judges' opinions. Let deeper pockets set the precedent; what I do want to do is to port the CMS to Django, an open-sourced CMS published by a foundation, get the search into Solr, also published by a foundation, and introduce hierarchy to the 80,000 subjects we already have indexed. I'm just going to put OpenCalais away for a while and start looking at DBpedia again, then see how that whole Zotero suit works out over the next few months or decades.

In one way, this is all great because I love the Semantic Web to the point of stupidity—to the point of building a custom content management system entirely based on alpha-level technology using RDF for storage, creating a framework even slower than Rails. So I'm grateful to Zotero for taking the brunt of the lawsuit, because it gave me reason to take off my rose-tinted Linked Data goggles, and made me aware that all of my planned Semantic Web taxonomy-sharing fun could come crashing down if I don't carefully track the provenance of every one of my triples, erring always on the side of raving terror.

Know what else is great? Now, finally, ten years on, I know that the Semantic Web is real and viable, because I'm afraid I'll get sued for using it. That's the true measure of a maturing technology—eat it, Gartner hype cycle.

I believe, as in don't-get-him-started, that taxonomy-driven interactive editorial is essential to the future of the web, and thus to storytelling and narrative in general. Clearly a great deal of money is being spent by major companies in pursuit of the golden triple: It appears the AP is working on taxonomy tools, and Rupert Murdoch's Dow Jones has Synaptica and publishes a cute taxonomy cookbook. A number of other companies are out there, building massive thesauri and indexing tools, hacking parsers and coding semantic disambiguators like mad, banging their heads against pronouns. There will be many, many competitors seeking to add their own structure our increasingly Web-content-driven reality, and we will, if we use their services, find ourselves beholden to their methods of indexing, with all manner of legal compliance and copyright issues as of yet untested in courts. Creating good, broad, world-describing taxonomies is extraordinarily expensive, because reality is large, and these companies will need to strike a balance between sharing their work and protecting it, so I imagine this will be a subject I'll revisit, professionally, many times over the next few decades (barring complete societal breakdown, or a personal spiritual awakening that allows me to stop thinking about this sort of thing).

Such questions could keep a librarian up at night, staring at the wall, petting his or her sleek gray cat Otlet and wondering what, for instance, a political campaign looks like when all of the news and columns are automatically classified before being published. Competition, he or she might conclude, must be encouraged between these platforms; there must be a free, and yet somehow regulated (perhaps by the W3C, or preferably by an organization with a more attractive website), market of taxonomies—you can't have people claiming to own concepts conjoined to unique identifiers, can you? Can you? You probably can? Oh.

But there's likely no reason to worry; and I am just borrowing trouble; and maybe the Semantic Web won't matter that much after all. Even if taxonomies do become increasingly important in our web of linked data, thank God we live in a society with an enlightened understanding of intellectual property, and that we can trust the tiny handful of organizations that control the world's supply of news, as they become software providers as well as content providers, to do the right thing when it comes to serving the needs of a wider populace, in a culture that would rather foster dialogue, discussion, and mutually beneficial resolutions than use the ugly, blunt tool of potentially profitable lawsuits. I'm sure—really, I am—that mine is an overreaction. And onward, to progress.


[Top]

Fixed


[Top]

NYU


[Top]

Also


[Top]

Steering Wheel

I've been walking home--my bike is in the shop forever and the weather is nice. I listen to episodes of the Jack Benny program on my phone, waiting for Mary Livingstone to laugh. I'm up through 1946.

The traffic where I live is so bad that sometimes I am stuck in my minivan for forty minutes before I get to work. So I use the steering wheel as a kind of prayer wheel. Each notch reminds me of a prayer. I go from notch to notch saying prayers for my husband, for each of my children, my parents, my friends, and the students in my class.

I read something like that 19 years ago in Guideposts. I was sitting in my grandparents' living room on their black sofa. I think of it whenever my computer gives me the pinwheel, or when I am on the phone at work helping an old lady onto the website, explaining that email doesn't need stamps. At the top right of the screen, I asked, do you see a little box? And to the left of the box is the word “Username?” You put a special name into that box. We have to make that special name.

“I'm old,” she said.

Down through Soho. People walk into traffic while text-messaging. I also have on headphones. It's warm, crowded, and progress is slow. I see a girl in canary leggings and short bangs, backlit by a storefront. She is laughing at a joke made by a boy in a vest. No wonder people want to live here. Right then Mary Livingstone laughs in 1946. A man with his tongue out is trying to shake hands with everyone. On Bowery I pass the New Museum, which has a sign reading “HELL YES!” in great rainbow letters. Faces lit from below or on the side by cell-phone screens and media players. I am moving slow but light is absolutely everywhere.


[Top]

I never told you because I was kind of out of it for a while there but

Dad has a blog.

So does my wife, actually.


[Top]

Sasquatch

The first movie I remember seeing was called Sasquatch: The Legend of Bigfoot. I'm sure I had been to the Warner Theater before that but I remember this movie because it was not for children, I was six, and there was some negotiation before I was allowed to go. My brother took me.

I remember the monster coming up over a hill, roaring, but far more intense than that was the massive yellow Sasquatch logo that appeared on the screen at the beginning of the film. Looking at a clip of the film (obviously awful) shows, in contrast to the eyeball-drilling of Star Wars or piss-shower of Taxi Driver, a thin, nervous country with just enough money for a pack of cigarettes and a tank of gas.

Why they were showing a 1975 movie about Bigfoot in 1981 at the Warner? I was at that point only a slip of paper in a pullover shirt and man did I like dogs. My brother might have worn a denim jacket lined with thick beige lambswool, and cars had ashtrays. The Warner was a velvet-and-gilt palace near the Woolworth's. The floor had an inch-thick layer of grime and every step you took, at least in my little sneakers, went THWICK. There were gilded women carved into the walls and a red curtain that pulled apart for the show. I would imagine it was built in the 1930s--(yes, it was)--a big dose of Celebrex to cure the Depression, and while the art-deco style was modern the curtain and gold belonged to the theater. Or more likely to vaudeville.

(A vaudeville story, according to my father: his father, as a boy, would get inside a tire to be rolled across the stage between acts; he got a nickel every time. You had to keep the show moving. Later he became a respected whistler. Never met him.)

I could keep going backwards here until I was at the Globe Theater watching men in bear costumes chasing after boys in wigs, or further back to a naked stage in a natural ampitheater with chanting men in masks. But you had to read Oedipus in high school too. And the Warner shut down soon after Return of the Jedi to become offices; no longer were movies within walking distance. A tragedy in the original sense--meaning a song for the the goats.


[Top]

Over There

~200 words on restraints at ABriefMessage.com.


[Top]

Ftrain.com

PEEK

Ftrain.com is the website of Paul Ford and his pseudonyms.

There is a Facebook group.

And six-words-only Twitter posts.

See also: Gary Benchley, Rock Star, a novel; Harper's Magazine; NPR's All Things Considered; The Morning News.

POKE


Syndicate: RSS1.0, RSS2.0
Links: RSS1.0, RSS2.0

Contact

© 1974-2007 Paul Ford

Recent

Been a while. (February 16)

Learning to Fear the Semantic Web, by Paul Ford. (October 15)

Fixed. (September 18)

NYU. (September 18)

Also. (September 11)

Steering Wheel. (September 11)

I never told you because I was kind of out of it for a while there but. (April 1)

Sasquatch. (March 26)

Over There. (March 24)

Signs. (March 21)

Eloquence Personified. (March 20)

Note. I wonder what the poor folks are doing tonight. (March 20)

The Wind Chest, by Paul Ford. (March 18)

Six-Word Reviews of 763 SXSW Mp3s. (March 13)

This Is Just To Say. (March 3)

Clouds. (February 27)

Fishing Party. (February 10)

A Joke. (February 5)

The Vet, by Paul Ford. (February 4)

The Swings. (January 31)

More...
Tables of Contents

News

In the past

Wednesday, June 2, 2004

Northeast Corridor, by Paul Ford.

Monday, June 2, 2003

Only the Dead, by Paul Ford.

Tuesday, June 2, 1998

02 Jun 98, by Paul Ford. Paging Tom Peters

Popularity contest

August 2009: How Google beat Amazon and Ebay to the Semantic Web

Colgate Money Shot

Pissing my Pants at Work

Selections from My Name is Blanket, © 2046 Blanket Jackson

Story

About Ftrain.com

Ford, Paul Edmund

Theory

Robot Exclusion Protocol

Ftrain FAQ

Until the Water Boils

Shaving the Eyebrows

The Condiment War

The Passivator

Looking for Something Stable

A Response to Clay Shirky's “The Semantic Web, Syllogism, and Worldview”

Cleaning My Room