CopyPastehas never been so tasty!

500+ Terabytes New Data Everyday on Facebook

by anonymous

  • 0
  • 0
  • 0

Orignal Source:

2.7 billion Like actions and 300 million photos per day, and it scans approximately 105 terabytes of data each half hour. Plus it gave the first details on its new “Project Prism”.



VP of Engineering Jay Parikh gave details about the need and with question mark situation to Facebook by saying:

 “Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.”

One more stat Facebook opened up, that over 100 pet bytes of data are stored in a single Hadoop disk cluster, and Parikh even told:

“We think we operate the single largest Hadoop system in the world.” In a hilarious moment, when asked “Is your Hadoop cluster bigger than Yahoo’s?”,

And most intrusting gossip action also came to notice when Parikh proudly wink with yes.

Added more:

“No one will care you have 100 petabytes of data in your warehouse”.

 The speed of intake keeps running and

“the world is getting hungrier and hungrier for data.”



Even researcher does not find it more useful for facebook. It get ahead of on the advantage of advertisers. Parikh said added about:



 “We’re tracking how ads are doing across different dimensions of users across our site, based on gender, age, interests [so we can say] ‘actually this ad is doing better in California so we should show more of this ad in California to make it more successful.”


there’s “Project Prism”. Right now Facebook in fact get keep it all live. Even revolving consumer database in a single data center, with anothers used for being without a job and other data. When the main lump gets too big for one data center it has to stir the whole thing to another that’s been lingering to fit it. This transfer around is a waste of resources.

Parikh says:



“Project Prism lets us take this monolithic warehouse…and physically separate but maintain one view of the data.”



The means the live dataset can be share, and hosted across Facebook’s data centers in California, Virginia, Oregon, North Carolina, and Sweden.



on the inside, Facebook has selected not to partition data or upright barriers between dissimilar business units like ads and purchaser support. Product developers can seem at data across departments to assess whether their newest little tweak augmented time-on-site, set off all criticism, or produce ad ticks.



Users might be a little small piece uneasy about the thought that Facebook employees could look so profound into their bustle, but Facebook assured  there are frequent protections against abuse. All data access is logged so Facebook can pathway which workers are looking at what.


Only those working on structure products that necessitate data access get it, and there’s an rigorous training course around satisfactory use. And if an employee pries where they’re not theoretical to, they’re ablaze. Parikh stated muscularly:



“We have a zero-tolerance policy.”

Add A Comment: