Eppy-google

From IVP Wiki
Revision as of 15:36, 17 June 2010 by Bill Densmore (talk | contribs)

Josh Cohen speaking to Editor & Publisher conference in Las Vegas

Running notes by Bill Densmore

Cohen talks about businesses built on controlling the flow of information are turned on their head. He wants to talk about the postive things about this.

  • Google isn't talking about saving the news industry but talking about re-inventing it.
  • In the tech world, there is always somebody gunning for you. He puts up images of AmericaOnline and MySpace.com as he talks about companies "that got passed by." Then he adds logo of Microsoft and Yahoo -- and then the logo of Google with a question mark on top of it.
  • The opportunity is "to innovate through this . . . we can emerge with a much more robust version of what the news industry can be."
  • "At Google we don't have all the answers. I think it is safe to say we don't know what all the questions are at this stage." Looking at finding the ability to inform more and more people. "Journalism matters to us and it matters to more and more of our users."

Talking about how Google works with the news indusry

  • Krishna Bharat developed Google News. He felt the delivery of news was a "tremendously inefficient process." He wondered about finding an automated way to pull links together and have similar stories matched together. In the beginning of 2002, the first version was release.
  • 50,000 sources, 30 languages and 60 editions in 40 countries. Now ends publishers about 1 billion clicks every single month. If you add Google's other service, that figure quadruples. "We crawl it, we group it, we rank it."
  • The look at the HTML code for instructions from publishers about what is to be crawled and what is not to be crawled. Each day that results in several hundred URLs available for organizing. They index it all, do a full-text analysis, look for key words or metadat about the story to group it in story clusters.

"Reflect the judgement of your editors

  • Final step is ranking. It's a two-step process. There is story or cluster ranking. You take 50 stories and rank them 1 to 50. And there's article ranking, to rank a story within a given cluster. "What we're basically trying to do is reflect the judgement of your editors ... what stories they think is important."
  • Article ranking: They look at a ton of signals. Looking for originality and novelty. Rehash is different from original. They look for location: "If there is a local source doing original reporting on a story." They look for things about the quality of the source. They look at volume of original publication and user feedback to help make those distinctions.