Archive for July, 2007
congrats marc..hp buys opsware
Monday, July 23rd, 2007Congrats to Marc Andreessen and the Opsware team …
ACM Most Popular books
Friday, July 20th, 2007[via] my ACM membership subscription
1. Area of Specialization:
Programming/Applications/Web Development
Most Popular Book:
Code Complete, Second Edition
http://pd.acm.org/book_detail.cfm?isbn=0735619670
2. Area of Specialization:
Databases
Most Popular Book:
Guru’s Guide to Transact-SQL, The
http://pd.acm.org/book_detail.cfm?isbn=0201615762
3. Area of Specialization:
Systems/Networking/Security/Web Services
Most Popular Book:
TCP/IP Illustrated, Volume 1: The Protocols
http://pd.acm.org/book_detail.cfm?isbn=0201633469
4. Area of Specialization:
Graphic Design
Most Popular Book:
The Digital SLR Guide: Beyond Point-and-Shoot Digital Photography
http://pd.acm.org/book_detail.cfm?isbn=0321492196
5. Area of Specialization:
Desktop Literacy
Most Popular Book:
Windows Forms 2.0 Programming
http://pd.acm.org/book_detail.cfm?isbn=0321267966
6. Area of Specialization:
Business Skills
Most Popular Book:
ROI at Work–Best-Practice Case Studies from the Real World
http://pd.acm.org/book_detail.cfm?isbn=1562864041
More mapreduce
Wednesday, July 18th, 2007Another Google Tech Talk on scalability from Barry Brummit. Barry works for Jeff and was short and concise. His focus was on mapreduce. Since Jeff did most intro, Barry was able have some fun. Main take aways other than technical ones-
- Its all graphs, very large at that
- short process cycles
- peer review of code
- style guides for code
- single source tree
- any one can work and checkin any bugs anywhere in the tree
Jeff Dean’s Scalability Talk
Wednesday, July 18th, 2007I listened to Jeff Dean on Google Tech Talks. Here are my notes dump from it..
- Key theme- focus on price/performance..lots of ok machines are better than few GOOD ones.
- HW Design
- low cost pcs
- custom built ones
- need for parallelism
- utlra reliable hw makes programmers lazy
- failure is inevitable
- build your own (this was awesome!) cork boards
- cluster side
- distributed, unique req of large r/w bandwidth, reliable clustering, large file sizes, client side code to take advantage of this
- master and chunk servers - master manages all metadata; chunks are 64mb and distributed across atleast 3 chunk servers. master manages the chunk dist. and marks servers down if busy etc.
- 200+clusters
- 5000+machines
- pools of 10k+clients
- 5+ PB fs
- need for simple and repeated computation over stored data
- [insight] a well designed library that is simple and built for a point solution STILL gets reused and extended for functions not originally intended for
- key/val pair. start-map-shuffle-reduce
- libraries take care of reliability
- ability to express a variety of problem in mapreduce style
- push computation close to data
- can deal with bad record, if the failure is deterministic
- batch oriented computation model
- avg. computation time 10-12mts
- concept of master/workers
- reduce function is parallel
- map tasks are grouped together and called SHARD
- stargglers are still an issue, that master needs to handle by redistributing the work
- compression of intermediate data
- higher level api than raw fs
- useful for structured and semi-structured data
- need to scale large datasets
- no viable commercial db for price/perf
- tight integration with gfs
- simple api - no joins/sql functions
- multi-level map - fault tolerant and persistent
- (row,col,tstamp) - cell
- tables are broken into tablets of 100mb size
- serving machine has about 100tablets
- easy to split tablets, more planning required for merging
- master/tablet server design - lock service for master election
- unlike gfs all metadata is in tablet server.
- client libraries
- segregation of columns on locality groups
- about 500cels so far..largest tablet server is managing 3pb
- issues across clusters
- now all works within a single cluster, but fails across clusters
- smart folks, small team, work on ideas that matter
- right hw/sw matters
- broad range of problems to be solved
Family and family
Monday, July 16th, 2007Stef has summarized our recent happenings on her latest post. Joe, Stef’s younger brother came out to see us in over 6 years and we had a good time - talking about everything under the sun till wee hours of the night.
Prior to that Krith, my sister, that I wrote about a while back was visiting us as well. July has been the month for family visits - already.





