home

Archive for July, 2007

Kids Lesson plans

Sunday, July 29th, 2007

Its all Math - Pt1

Monday, July 23rd, 2007

congrats marc..hp buys opsware

Monday, July 23rd, 2007

Congrats to Marc Andreessen and the Opsware team …

Shoot for the stars…

Saturday, July 21st, 2007

Why Amazon is still sexy place to work..reason here

Energy Conservation

Friday, July 20th, 2007

ACM Most Popular books

Friday, July 20th, 2007

[via] my ACM membership subscription
1. Area of Specialization:
Programming/Applications/Web Development
Most Popular Book:
Code Complete, Second Edition
http://pd.acm.org/book_detail.cfm?isbn=0735619670

2. Area of Specialization:
Databases
Most Popular Book:
Guru’s Guide to Transact-SQL, The
http://pd.acm.org/book_detail.cfm?isbn=0201615762

3. Area of Specialization:
Systems/Networking/Security/Web Services
Most Popular Book:
TCP/IP Illustrated, Volume 1: The Protocols
http://pd.acm.org/book_detail.cfm?isbn=0201633469

4. Area of Specialization:
Graphic Design
Most Popular Book:
The Digital SLR Guide: Beyond Point-and-Shoot Digital Photography
http://pd.acm.org/book_detail.cfm?isbn=0321492196

5. Area of Specialization:
Desktop Literacy
Most Popular Book:
Windows Forms 2.0 Programming
http://pd.acm.org/book_detail.cfm?isbn=0321267966

6. Area of Specialization:
Business Skills
Most Popular Book:
ROI at Work–Best-Practice Case Studies from the Real World
http://pd.acm.org/book_detail.cfm?isbn=1562864041

Evolution of Triclidus

Wednesday, July 18th, 2007

More mapreduce

Wednesday, July 18th, 2007

Another Google Tech Talk on scalability from Barry Brummit. Barry works for Jeff and was short and concise. His focus was on mapreduce. Since Jeff did most intro, Barry was able have some fun. Main take aways other than technical ones-

  • Its all graphs, very large at that
  • short process cycles
  • peer review of code
  • style guides for code
  • single source tree
  • any one can work and checkin any bugs anywhere in the tree

Jeff Dean’s Scalability Talk

Wednesday, July 18th, 2007

I listened to Jeff Dean on Google Tech Talks. Here are my notes dump from it..

  • Key theme- focus on price/performance..lots of ok machines are better than few GOOD ones.
  • HW Design
  • low cost pcs
  • custom built ones
  • need for parallelism
  • utlra reliable hw makes programmers lazy
  • failure is inevitable
  • build your own (this was awesome!) cork boards
  • cluster side
  • systems-GFS
    • distributed, unique req of large r/w bandwidth, reliable clustering, large file sizes, client side code to take advantage of this
    • master and chunk servers - master manages all metadata; chunks are 64mb and distributed across atleast 3 chunk servers. master manages the chunk dist. and marks servers down if busy etc.
    • 200+clusters
    • 5000+machines
    • pools of 10k+clients
    • 5+ PB fs
  • System-mapreduce
    • need for simple and repeated computation over stored data
    • [insight] a well designed library that is simple and built for a point solution STILL gets reused and extended for functions not originally intended for
    • key/val pair. start-map-shuffle-reduce
    • libraries take care of reliability
    • ability to express a variety of problem in mapreduce style
    • push computation close to data
    • can deal with bad record, if the failure is deterministic
    • batch oriented computation model
    • avg. computation time 10-12mts
    • concept of master/workers
    • reduce function is parallel
    • map tasks are grouped together and called SHARD
    • stargglers are still an issue, that master needs to handle by redistributing the work
    • compression of intermediate data
  • system-bigtable
    • higher level api than raw fs
    • useful for structured and semi-structured data
    • need to scale large datasets
    • no viable commercial db for price/perf
    • tight integration with gfs
    • simple api - no joins/sql functions
    • multi-level map - fault tolerant and persistent
    • (row,col,tstamp) - cell
    • tables are broken into tablets of 100mb size
    • serving machine has about 100tablets
    • easy to split tablets, more planning required for merging
    • master/tablet server design - lock service for master election
    • unlike gfs all metadata is in tablet server.
    • client libraries
    • segregation of columns on locality groups
    • about 500cels so far..largest tablet server is managing 3pb
  • still more work needs to be done
    • issues across clusters
    • now all works within a single cluster, but fails across clusters
  • need for a single global namespace
  • newer products will need these
  • Google won the NISTEval for language translation
  • More DATA is better
  • parting
    • smart folks, small team, work on ideas that matter
    • right hw/sw matters
    • broad range of problems to be solved

    Family and family

    Monday, July 16th, 2007

    Stef has summarized our recent happenings on her latest post. Joe, Stef’s younger brother came out to see us in over 6 years and we had a good time - talking about everything under the sun till wee hours of the night.

    Prior to that Krith, my sister, that I wrote about a while back was visiting us as well. July has been the month for family visits - already.