Elastic Search - Distributed and Highly Available Search Engine

Here are the requirements:

"So, we build a web site or an application and want to add search to it, and then it hits us: getting search working is hard. We want our search solution to be fast, we want a painless setup and a completely free search schema, we want to be able to index data simply using JSON over HTTP, we want our search server to be always available, we want to be able to start with one machine and scale to hundreds, we want real-time search, we want simple multi-tenancy, and we want a solution that is built for the cloud"

 

check it out - Elastic Search

Competitive Intelligence - Yelp

29 million unique visitors a month

Almost half  of Yelp’s users  (46 percent) are between 18-34 years old, while 36 percent are between 35-49.

Reviews

  • Restaurants currently make up 29 percent of reviews
  • shopping currently makes up 23 percent
  • beauty and fitness (9 percent
  • arts & entertainment (8 percent)
  • home and local services (7 percent)
  • entertainment (5 percent)
  • nightlife (4 percent).

Source: KelseyGroup

UIMA and GATE Applications over Hadoop

There is a new project on google code called Behemoth that utilizes Hadoop to achieve scale for GATE and UIMA applications. Here is a more detailed post from Julien:

"Behemoth allows to deploy GATE or UIMA applications over a Hadoop cluster in
order to do very large scale document analysis. It uses a very simple
representation format which can be used as a common ground between UIMA and
GATE-generated annotations, hence achieving compatibility between both
systems. Since it is Hadoop-based it benefits from all its features
(scalability, fault-tolerance, etc...) and most notably the back up of a
thriving open source community. Quite a few Apache resources already do or
will fit into it: Nutch, Tika, Mahout, Hbase etc..."