elias diab

all watched over by machines of loving grace

A Briefing on Big Data This Week (+a Video)

The 17th KDD (Knowledge Discovery and Data Mining) conference by ACM, took place in San Diego earlier this week. And what a few decades ago would seem like something that scientists should care about, data mining today plays a crucial role in practically everything, from computing to biology and natural sciences to sales companies.

“Businesses and industry are increasingly interested in leveraging the data they capture through business processes,” says Chid Apte, director of analytics research at IBM and chair of the conference. In particular, he points to health care, social media, and anything that takes place on the Web.

Wherever data can be found in large amounts data mining is essential. But data isn’t always in a nice organised form, for example the web isn’t in that form either (will it ever be?), which makes things more complicated.

Today’s data, however, doesn’t take the familiar form of the database. “The information’s not coming at you in a clean tabular form,” Apte says. “It’s coming at you in a network form.” Often it arrives in a graph, he explains—such as those used by social media. These graphs often record not only the complex connections between nodes but also other types of information in a diversity of formats, such as the videos, images, and comments that people post on social networks.

[Technology review]

At the same time, IBM builds the biggest data drive ever: 120 petabytes. Can you imagine how big this is? And think about it, in 5 years time it won’t look that big at all.

120 petabytes of storage is an insane amount, eight times larger than the 15 PB arrays already out there, and they already had to deal with address space issues. In IBM’s huge array, tracking the location and calling data for its files takes up fully 2 PB of its own space. You’d need a next-generation file index just to index the index!

[Techcrunch]

Especially with the growth of the internet, where the information exchange became so easy, fast and universal, and for the days to come, data mining will be the central point (and I hope not a bottleneck) in any aspect of man’s progress where information will be on top of knowledge.

storage-array/