• MSN
  • Hotmail
  • More
    • Autos
    • My MSN
    • Video
    • Careers & Jobs
    • Personals
    • Weather
    • Delish
    • Quotes
    • White Pages
    • Games
    • Real Estate
    • Wonderwall
    • Horoscopes
    • Shopping
    • Yellow Pages
    • Local Edition
    • Traffic
    • Feedback
    • Maps & Directions
    • Travel
    • Full MSN Index
  • Bing
  • NBCNews.com
  • TODAY
  • Nightly News
  • Rock Center
  • Meet the Press
  • Dateline
  • msnbc
  • Breaking News
  • Newsvine
  • Home
  • US
  • World
  • Politics
  • Business
  • Sports
  • Entertainment
  • Health
  • Tech
  • Science
  • Travel
  • Local
  • Weather
Advertise | AdChoices
  • Recommended: Rebirth after the big storm: How one small town dug out, spruced up and lived on
  • Recommended: 'Like a Hollywood movie': Driver survives I-5 bridge collapse into Wash. river
  • Recommended: 'Winter' - maybe even snow - to return for Memorial Day weekend
  • Recommended: Cars, drivers plunge into river after Wash. I-5 bridge collapse

NBC News reporters bring you compelling stories from across the nation. For more US news, follow us on Twitter and Facebook.

  • ↓ About this blog
  • ↓ Archives
    • Icons Email E-mail updates
    • Icons Twitter Follow on Twitter
    • Icons Feed Subscribe to RSS
  • 9
    Apr
    2012
    7:09am, EDT

    Did US taxpayers get a good deal? Census 1940 site was built for free

    National Archives

    The website at 1940census.archives.gov is operated by a private company, for free. In exchange, it can use the free public records on its for-profit site as well. Other companies paid $200,000 for the records.

    By Bill Dedman, Investigative Reporter, NBC News

    Who says there's no free lunch?

    You may have read over the past week about the release of 1940 Census records on a new U.S. government website, a site that buckled under the huge demand from people looking up details on the lives of their friends and relatives from the Great Depression.

    You may not have realized that the site was built for U.S. taxpayers for the price of — not one dime. A company from Silicon Valley built the site, and is operating it, for free. Genealogy buffs have been using the site for a week now to check millions of records. (See our earlier story for tips on searching the 1940 Census, and examples of people who have found relatives.)

    Of course, the company, Inflection LLC of Redwood City, Calif., did get something in return for its effort: a free copy of those 3.8 million images of records from the 1940 Census. While other companies paid $200,000 for a set of the public records, Inflection can use those records in its for-profit business, a genealogy site called Archives.com.

    It's a barter system for federal records: the public gets a free official U.S. website, and the company gets free data. It's been done before, as when the U.S. Patent and Trademark Office gave data to Google, which since 2006 has hosted the site for free as Google Patents.


    Do you approve of the approach that the National Archives took, giving the data away in exchange for the free website? And what stories have you found in the 1940 Census? Add your story in the comments below or on our Open Channel page on Facebook.

    Inflection also was hoping to get a boost to its reputation for building websites that could withstand a storm of traffic.

    Performance standards in the contract
    Both the company and the National Archives and Records Administration (NARA) had anticipated that the site would draw a crowd, as 72-year privacy restrictions expired and the records became available. What happened next lends credence to the boast that genealogy is the country's favorite hobby.

    The contract says, "Drawing from NARA's experience in releasing the 1930 Census, and the experience of the National Archives of the United Kingdom when they released their 1901 and 1911 Censuses, NARA anticipates immense interest in the 1940 Census and a tremendous increase in traffic to its www.archives.gov web site." (Here's the contract in a PDF file.)

    But how much of a crowd?

    Here are the performance standards in the contract:

    • "When browsing from one image to another, each image should be presented to the user in 3 seconds or less."
    • "When moving from the standard rendered image to each zoom level (e.g. zoom 1x, 2x, 3x), the reformatted image should be rendered in 2 seconds or less."
    • "Support up to 10 million hits per day while providing response times of less than three seconds for keyword searches of the descriptive metadata."
    • "Support up to 25,000 concurrent users."

    There was one more element in the contract, a somewhat vague requirement that Inflection increase service if demand was greater than anticipated.

    • "Scale on demand in the event that 10 million hits and/or 25,000 concurrent users are exceeded to ensure that the performance requirements ... are still achieved."

    The crowd certainly exceeded those levels, as the most old-fashioned sounding search term possible, "1940 Census," became a top "trending topic" on Google and Twitter.

    Most people seemed to get little or nothing from the site on the first day, including Census leaders, who were prepared to show off how easy it was to look up their grandparents. When the site stuck on "loading image," as it did for many other users, the officials resorted to showing a PowerPoint presentation with the results from an earlier search.

    A 'tsunami'
    As Inflection's general manager, Joe Godfrey, told us last week, "We were expecting a flood, but we got a tsunami."

    • On Day One, Monday, an estimated 100 million hits, or requests, with 22.5 million hits in just the first three hours. Though Inflection scrambled to improve service, the site was unusable for many users on the first day. The company added more servers through Amazon Simple Storage Service, its cloud data service provider, and also restricted some features on the site (such as zooming of images), until finally it was able to get on top of the traffic.
    • On Day Two, Tuesday, the numbers haven't been totaled, but it's believed to be higher than on Day One, with an estimated 40.1 million hits in the three-hour peak.
    • By Friday, the site was stable with about 60 million hits per day, and had served up more than 80 million images, or about 61 terabytes of data, the National Archives said. (That's more than the data contained in the first 20 years of astronomical observations by the Hubble Space Telescope.) The service quality was better than called for in the contract, with a load time of about 1.8 seconds per page, according to the Archives.

    In other words, this might have been a good project for a "soft launch."

    The contract called for extensive load testing before the release. We asked the National Archives for copies of those test results, but its spokeswoman said it wouldn't be able to provide them. But it said the site was tested to handle more than 70,000 simultaneous users — more than the contract called for, and fewer than the level that resulted.

    A 'no-cost contract'
    No-cost contracts are allowed under Federal Acquisition Regulation competitive procedures. This contract has a one-year base period and options to extend for four more one-year periods.

    "NARA provided a copy of the data to Inflection at no cost, copies that were sold to others for $200K," said spokeswoman Laura Diachenko of the National Archives. "Why Inflection agreed to this is a better question for them, but we are very happy to have them as a partner. They have experience with Census data, and managing access to large data sets, the capabilities we were seeking for this project."

    She added, "Even though this is called a no-cost contract, the Government did incur costs — in this case, aside from our resources, we also provided a copy of the 1940 Census to Inflection, at no cost.  In this particular case, we provided them data that they wanted in exchange for hosting access to this data.  Their interest was in getting the data (for their archives.com business), and for business development (attracting users to their site and eventually converting them to a subscriber."

    Inflection's Godfrey said, "The primary value for us was in building our brand/notoriety, leveraging and expanding our technical expertise/infrastructure and helping to getting this extremely valuable record collection into the hands of as many people as possible.  Also, our engineering team (like all great engineers) are motivated by tackling challenging technical problems, and so the team was very excited to work on this."

    Competition
    All or most of the 1940 Census is now available free from several other companies, which had to pay for the public records. As a sort of loss leader, other genealogy sites, even the commercial ones, are making the 1940 Census records available for free, to subscribers and non-subscribers alike.

    Here's how the race worked: All the commercial sites that chose to buy the data for $200,000 were handed a rack of hard drives full of 20 terabytes of images, taken from 4,745 rolls of microfilm, at 12:01 a.m. on April 2, or 72 years and a day after the Census Day in 1940.

    By Thursday, a relatively new genealogy site called myHeritage, was the first to have all the images online. Also making images available for free are Ancestry.com, a commercial site, and FamilySearch.org, owned by the Church of Jesus Christ of Latter-day Saints.

    Thousands of volunteers are working on the next step: indexing the records by name, just as previous Census releases have been indexed by volunteers. Until those indexes are finished, searching is done only by address or neighborhood.

    Your view
    Do you approve of the approach that the National Archives took, giving the data away in exchange for the free website? And what stories have you found in the 1940 Census? Add your story in the comments below or on our Open Channel page on Facebook. See our earlier story for tips on searching the 1940 Census.

    22 comments

    No taxpayer dollars used and there's still gonna be whining on here

    Show more
    Explore related topics: history, census, records, documents, genealogy, featured
  • 2
    Apr
    2012
    1:07am, EDT

    A 'tsunami' swamps Archives and Silicon Valley firm serving up 1940 census

    By Bill Dedman, Investigative Reporter, NBC News

    Update, 5:40 p.m. ET: The firm at the center of today's census records meltdown says, "We were expecting a flood, but we got a tsunami."

    "We had estimates of how much traffic was going to hit the site, and we did performance testing at several levels above that, but we were surprised by the traffic," Joe Godfrey, senior director of product and general manager for Inflection, a Silicon Valley database company."

    Inflection was hired by the National Archives and Records Administration, which provided the 1940 census records. Inflection buiilt the search engine to serve up the records, and relied on Amazon Simple Storage Service (Amazon S3) as the cloud service provider. Inflection has been adding more of a pipeline to Amazon all day, adding the ability for more simultaneous connections, but so far searches for census records are running slowly or not running at all for many users.


    The company is trying to serve up 3.8 million images of census documents, each with multiple views at different zoom levels, with each file being 10 megabytes or larger.

    Godfrey said the situation has improved, and engineers are hoping by the end of today to have the situation squared away.

    Earlier:

    Embarrassed by a computer system that crumbled under public demand, the National Archives and Records Administration said Monday that it's working to add more servers for the release of 1940 Census records. For more users the wait to see records on family members from the Great Depression era will go on for a while longer.

    The Archives had hired Inflection, a Silicon Valley database company, to run the computers, but frustrated users lit up Facebook and Twitter with complaints about images that were said to be "loading" but never arrived.

    "Our testing indicated NARA and Inflection could handle the load, but 1.9 mil visitors caused issues we're working to resolve," the Archives said via Twitter. Later it added, "We'll let you know as soon as we have another update - thank you for your patience, we know it's incredibly frustrating."

    Even agency officials, during the webcast to kick off the day, couldn't get images to load when they tried to look up their own relatives.

    In Springfield, Ohio, Facebook user Val Lough commented on our page: "It's very sweet of them to put all of these records on line. It would be even nicer of them to make the records VISIBLE. None of them will download, I have a browser window opening that's 'loading' the documents and has been for about 20 minutes. You might want to find out what their issues are. It would be faster to mail a public records request to the National Archives." Many others are tweeting about delays.

    The National Archives says it is putting more servers online to handle the crush.  At one point, the Archives said, its computers were receiving 100,000 hits per second.

    Hey, you've waited 72 years to see these records, so what's another day or two.

    Earlier:

    A time capsule from 1940 was opened on Monday at 9 a.m. ET, and we invite readers to share what they find. If you use the new records to find information about the loved or lost in your family, please post a note in the comments below or on our Open Channel page on Facebook.

    U.S. Census records for individuals from April 1, 1940, protected until now by a 72-year privacy law, are now public for the first time, revealing details about millions of Americans from that day, as the country lingered in a Great Depression, still a year away from entry into war in Europe and the Pacific.

    "I'm so excited!" Gary Robert Del Carlo of Martinsburg, W.Va., posted on Facebook. "Maybe for the first time ever, I'll be able to find out something about my father. All I have is my birth certificate with his name, date of birth, state born in, and that he was in the Army stationed in Washington State. His military records burned up in St. Louis in a fire in 1973. They would have told me a lot. Wrote for his birth certificate, and there was no records of his birth. I have done nothing but hit brick walls every which way I turn. I'm praying I find something useful tomorrow, anything."

    NPR describes the release as the "Super Bowl for Genealogists." Librarians around the country are ready to provide assistance. At the Family History Library in Salt Lake City, the staff will be serving cake and providing help.

     

    When the 120,000 census takers counted 132,164,569 people living in the country on that day, the information collected included the address, whether the house was owned or rented, value of the home or monthly rent, is it considered a farm, names of adults and children, familiy relationships, sex, race, age, place of birth, citizenship, residence five years earlier, education. And for a small subset of people, about 5 percent, they were asked about place of birth of mother and father, language spoken in the home as a child, veteran status, wars served in, Social Security status, occupation, employment status, occupation, number of weeks worked in 1939, income and, for women, whether they had been married more than once, age at first marriage, and number of children ever born.

    There is a catch. As the records go online, they can't be searched by name. For a city it's helpful to know an exact address, but often you can work with a neighborhood (near the corner of Canal and Varrick streets in New York City). Your public library may have old city directories or telephone directories from that period, allowing you to look up people by name to find an address. For a rural area, you need to know at least the county and the name of the town or township.

    Genealogists, librarians and volunteers will begin the work of indexing the records, which eventually will allow searches by name. Two sites, the commercial Ancestry.com and the Mormon Church's FamilySearch.org, have announced plans to provide indexes to their customers as quickly as possible, with some images going online on Monday. FamilySearch and Ancestry.com started putting images from the Census files online early on Monday, but for now without a name index. 

    For now, you must know at least an approximate address to get started. You use that address to find an "enumeration district," which in a big city might be only a few blocks, and would be a larger area in a small town.

    Another approach, for those interested in a specific place, is to look at all the records for your block or street. If your area was settled in 1940, who lived there then, and what were their lives like?

    Your goal: With that district number, you can look on the Census website at the online copy of the form filled out by the census taker in 1940. In 70 years, it has gone from paper to microfilm to computer.

    Here are resources to help you with the search (links open in a new window), though as with most things in life, the key is: Ask a librarian.

    • Most important page No. 1: Step-by-step help from private researchers with free aids to help you find the enumeration district map for a particular address
    • Most important page No. 2: A Census explainer on starting your search.
    • The home of the 1940 Census
    • A Census page with general information on the 1940 release
    • A copy of the 1940 Census form (PDF file) that you can fill in when you find information
    • Census aids to finding information
    • Ancestry.com, a commercial service for genealogists
    • FamilySearch.org from the Church of Jesus Christ of Latter-day Saints
    • Tell us what you find: Post your story on Open Channel's Facebook page

    Submit ideas Share your story ideas or documents with Open Channel

    Facebook Follow Bill Dedman on Facebook

    Facebook Follow Open Channel on Facebook

    Twitter Follow Bill Dedman on Twitter

    Twitter Follow Open Channel on Twitter

    E-mail alerts Sign up for e-mail alerts

     

    89 comments

    Wait just a minute - this is the FEDERAL, taxpayer funded National Archives that you're complaining about being too slow. You are all going to vote GOP this year to reduce spending by federal government and fire all those government workers. That means fewer people, cheaper equipment, less equipment …

    Show more
    Explore related topics: history, census, records, documents, genealogy, featured

Browse

  • featured,
  • crime,
  • military,
  • weather,
  • california,
  • updated,
  • florida,
  • environment,
  • us-news,
  • shooting,
  • new-york,
  • texas,
  • education,
  • chicago,
  • police,
  • gulf-oil-spill,
  • kari-huus,
  • nbcnewyork,
  • los-angeles,
  • murder,
  • new-jersey,
  • guns,
  • afghanistan,
  • obama,
  • colorado,
  • sandy,
  • trayvon-martin,
  • nbclosangeles,
  • barack-obama,
  • crime-and-courts,
  • politics,
  • gay,
  • veterans,
  • connecticut,
  • fire,
  • arizona,
  • snow,
  • crime-courts,
  • religion
Also
Advertise | AdChoices

Bill Dedman

Investigative reporter Bill Dedman of NBC News is always looking for good investigative story ideas and documents. Bill received the 1989 Pulitzer Prize for investigative reporting, and has written full time for NBCNews.com since 2006.

Bill Dedman Blogroll

  • Bill's investigative reporting feed on Twitter
  • ABC News The Blotter
  • Center for Investigative Reporting
  • Center for Public Integrity
  • Center for Public Integrity's Paper Trail blog
  • Huffington Post Investigative Fund
  • Investigative Reporters and Editors' Extra! Extra!
  • McClatchey blog Nukes & Spooks
  • New York Times' City Room Records blog
  • New York Times' Open data blog
  • ProPublica
  • ProPublica blog
  • Yahoo! News The Upshot
  • TPM Muckraker
  • Washington Post Investigations
  • WhoWhatWhy forensic journalism
  • New England Center for Investigative Center at Bos
  • Wisconsin Center for Investigative Journalism
  • Pulitzer Center on Crisis Reporting
  • Schuster Institute for Investigative Journalism, B
  • MinnPost.com
  • The Washington Independent
  • AU Investivative Reporting Workshop
  • Become a fan on Facebook
  • Follow on Twitter
Have an idea?
Send your ideas and documents for investigative stories.

Archives

  • 2013
    • May (386)
    • April (608)
    • March (548)
    • February (510)
    • January (563)
  • 2012
    • December (457)
    • November (460)
    • October (477)
    • September (432)
    • August (525)
    • July (519)
    • June (508)
    • May (566)
    • April (538)
    • March (576)
    • February (471)
    • January (417)
  • 2011
    • December (455)
    • November (190)
    • October (9)
    • September (3)
    • August (51)
    • July (8)
    • June (3)
    • May (12)
    • April (5)
    • March (3)
    • February (1)
    • January (8)
  • 2010
    • December (5)
    • November (1)
    • October (2)
    • September (28)
    • August (40)
    • July (35)
    • June (177)
    • May (50)
    • April (9)
    • March (2)
    • February (2)
    • January (4)
  • 2009
    • December (5)
    • November (5)
    • October (2)
    • September (11)
    • August (4)
    • July (12)
    • June (1)
    • May (1)
    • April (1)
    • March (3)
    • February (3)
    • January (2)
  • 2008
    • December (3)
    • November (2)
    • October (6)
    • September (30)
    • August (26)
    • July (10)
    • June (4)
    • May (8)
    • April (13)
    • March (9)
    • February (7)
    • January (6)
  • 2007
    • December (10)
    • November (6)
    • October (22)
    • September (11)

Most Commented

  • Man with ties to Boston bombing suspect admits role in 2011 murders; shot during FBI questioning (2120)
  • US judge rules department of 'toughest sheriff' engages in racial profiling (2706)
  • Boy Scouts vote to lift ban on gay youth (4293)
  • At least 51 killed, including 20 children, as tornado tears through Oklahoma (1810)
  • Scouts await decision on gay membership (2228)
  • Zimmerman defense releases texts about guns, fighting from Trayvon Martin's phone (1767)
  • Jodi Arias pleads for jury to spare her life, says, 'I want everyone's pain to stop' (854)

Other blogs

  • The Body Odd
  • Cosmic Log
  • Red Tape Chronicles
  • PhotoBlog
  • Open Channel

NBCNews.com top stories

3147,10
© 2013 NBCNews.com
  • US news on NBCNews.com
  • About us
  • Contact
  • Help
  • Site map
  • Careers
  • Closed captioning
  • Terms & Conditions
  • Privacy policy
  • Advertise