News as the raw material for analysis
Posted Friday, February 22, 7:43AMWanted to share this post on O'Reilly (http://radar.oreilly.com/archives/2008/02/reuters-ceo-sees-semantic-web.html) that I thought was very interesting, and a good seed topic for this group.
In it, Tim O'Reilly discusses an interview he did with Devin Wenig, COO of Reuters. Mr. Wenig discussed a perspective on the news that is, seemingly, uncommon: that news is not necessarily the end product, rather, it is the raw material for analysis by others. This thinking makes great sense when you see what they're doing with the purchase of ClearForest and the recent availability of Open Calais (http://www.opencalais.com/).
What do you think about this?

Accepting news as "raw material" is also accepting news as (probably) a commodity. As a provider of news Reuters needs to have some unique capability otherwise it's a race to the bottom on price. However, being a raw material also binds you in tightly in the value add - so high-level analysis becomes more dependent on the raw material.
This unique capability for Reuters might be something abstract like "trust" embodied in a brand. Perhaps there's a universal rule that analysis is only as trustable as the least trustable news/fact it is built upon.
As a Reuters employee I've got to wholeheartedly agree with all Mr Wenig says ;-) . This 'raw material' approach makes it all the more important that Reuters also continues to innovate in the higher-level analysis and tools, and to innovate in the news capture processes and approaches.
I'm around at the conference and I hope we'll all be able to discuss this topic.
News needs a microformat.
http://microformats.org/
I'm shocked there isn't one.
We should make one.
Who's in?
Longer rant:
They solve the big problem with special XML formats by being built purely with XHTML. Basically, that means that you don't need to translate from a XML format to HTML, you can just style up the microformatted information with CSS.
For instance, if I was writing a news microformat, I would need a way to denote a captioned image. Since there is no pre-defined way to do that with XHTML (like there is, say, with italicized text, the tag.), I might do something like this:
This is the caption of the photo below
If a few sites agree to write XHTML that way then web spiders, browsers, etc, would know something semantic about the data, that it was a captioned image in a news article, instead of just knowing it was a div containing an image and a paragraph.
It's a fantastic extension to the already fantastic ideas about writing semantic XHTML. It's sort of a form of search engine optimization, but 10 times more powerful -- because yours and other sites are presenting the information in a consistent structure.
Consistent, semantic, and structured. And without requiring any new technologies, browsers, servers, or standards bodies.
Woot!
Ooh. My example broke. Doh. Silly form.
Brian, there's NewsML as a way to package news, but this has a focus on syndication. This uses XHTML in the body of the content (AFAIK).
But this is an interesting alternative approach, and attractive from a SEO and spidering perspective which drives so much traffic (and hence revenue).