data lake

A lake full of data that requires a tree full of money to build.

A "data lake" is the term used to refer to a giant centralized place to hold data. The data in a data lake is generally a copy, the canonical version is stored somewhere else.

For example, a media company may create a data lake to store copies of its event tracking data, subscription data and newsletter sign-up data if they are not already in the same database. The benefit of having all the data in the same place is that you can ask questions that cross datasets. For example, "How many of our newsletter subscribers spend over 1 hour on our site each day?"

Consultants love to suggest that companies build a data lake to solve, well, basically anything. And while they are useful if implemented well, they are not a panacea.

Depending on the quantity of data, the quality of the data and the number of data sources, building a data lake can range from a fairly-technical-and-time-intensive task to incredibly-technical-and-time-intensive task. Data lake projects tend to run over time and budget, if they are completed at all. There does not exist an out-of-the-box solution because every company has different data sources and different requirments. It just turns out that continuously replicating large amounts of data from different sources is a hard problem that may or may not be worth the "business insights" you can glean when you're done.

