The MyTrees.com Genealogy and Family History Center Explore the MyTrees.com Forum for your ancestors!

 

Genealogy & Family History
The MyTrees.com Genealogy and Family History Center Explore the MyTrees.com Forum for your ancestors!
Genealogy & Family History
Newsletters
Our Sponsors

Select Newsletter by Issue or Topic:

Genealogy HowTo
Issue: Mar 11, 2016

The "Hidden Genealogy Web" Tips and Tricks

by Cindy Carman

You are missing some exceptional resources, if you have not been using the "hidden genealogy web". So what is the "hidden genealogy web" and how can you access it?

The term "hidden web" mainly refers to information that search engines cannot index, like databases. It has been estimated that there is 500 times more data in the "hidden web" generally than in the publicly indexed web. The "hidden web" has often been pictured as a giant iceberg with 10% above the water line and 90% below. Accessing the hidden 90% can mean a big boost to you as a genealogist, if some of those pages refer to your ancestors.

So the big question becomes: Why hasn't Google or Bing or Yahoo indexed this "hidden genealogy web", so genealogists can search this data? A better question would be: How can genealogists find data from the "hidden genealogy web" without using a search engine service?

There are several reasons a genealogy database does not get indexed by one of the Internet search-bots.

  1. Access to the database is denied to the search-bot or at least limited.
  2. The database is in a format that the search-bot can't read.
  3. The search service doesn't know about the site in order to index it.
  4. The database is password protected.

Let's look at some examples of these indexing restrictions.

The data is in a database that is denied to the search-bot or at least limited.
Some sites restrict the indexing of their databases because they find that the search-bot uses too much bandwidth and it slows their website down to a crawl. They may restrict the rate at which a search-bot can gather data. So the accesses are limited to a few accesses a minute rather than the hundreds or even thousands that would be required to fully index a site. Sometimes the website manager denies the bot altogether because the bot is gathering data from areas that the manager wants to restrict.

The database is in a format that the search-bot can't read A good example of the first, second, and third reasons a database has not been indexed is FamilySearch.org.

FamilySearch.org contains a vast amount of genealogical data and is free to search, featuring a pretty sophisticated search engine to boot. There are thousands of volunteers working practically round the clock to extract the data from the images that The Church of Jesus Christ of Latter-day Saints has collected and placed online. Only certain aspects of the FamilySearch.org website has been opened to be indexed by the search-bots. Mostly a researcher has to go to the website to do a search and obtain the family record data for their ancestor.

You may not know that FamilySearch.org has posted on their site images to hundreds more databases that have not yet been indexed. They can be found at the very bottom of the "search/collection/location/" pages for each country or location. Scroll to the bottom of this page and you will see an example of the hundreds of thousands of US database images just waiting to be indexed. You can freely browse these images. Some of the databases are not set as a high priority to get indexed, because the data, that is represented on these images, are in alphabetical order, or the image set has an index as part of the set. This imaged data is part of the "hidden web".

The database is password protected. Examples of a password protected site would of course be Ancestry.com and MyTrees.com . The search-bot is denied access, because the configuration of the web site prevents access by requiring a password. Even though MyTrees.com provides a free search of the index to its data, special indexes have to be created for the search-bots so they can make the data available to the users of their search services. Not all of the free links at MyTrees.com have been indexed by the search-bots because so much data is added everyday and the special indexes need to be rebuilt regularly to include the new data.

Usually a password protected genealogy site wants to attract people to their service through a popular search engine like Google or Bing. In order to do this, they often have to resort to creating a satellite site which is a site of a different name that provides a sampling of the data from their collections. The links that you will find at a satellite site direct you back to the password protected site. An example would be the site familyhistory.com, which is part of the Ancestry.com network of websites, or geni.com which is part of the myHeritage.com network.

So can genealogists find data in the "hidden genealogy web" without using a search engine service? The answer is YES!
There are thousands of genealogy websites that are stuffed with information that has not been indexed by any search engine. Most of these websites represent a specific locality, nationality, or genre of data.

Location, Location, Location
The most successful way to find information in the hidden web is to first identify the location where your ancestor lived. Then using one of the methods below you will be able to locate websites that have databases worth searching, that are part of the hidden web.

Methods for finding family history websites in the "hidden web"

  • Find the website of the main library for the locality where your ancestor lived. This might be a genealogy library or just a public library. Search for genealogy help on the site or locate their favorite links page.
  • Find and join a genealogical society for the locality where your ancestor lived. Search the databases they have available using your login, and ask for genealogical help. Use their favorite links page to find websites of interest.
  • Use a link aggregator site to find links to websites with searchable databases of genealogy data. My favorite is FamilySearch.org/learn/wiki/. When you have accessed the site, navigate to the locality of your ancestor to find links to websites of interest. (Here is an example of the Indiana Searchable List from FamilySearch.org and the Quebec Searchable list of Records )
  • Use Google.com to find other sites with searchable databases and other aggregate links sites. Use the search terms: online searchable records. Don't put this phrase in quotes. Then add to the search field the locality of your ancestor and one of the terms birth, death, marriage, land, military, etc.

Be aware that it is a monumental task to keep all the links functioning on an aggregate site. If you find a bad link and you really want to see the resource, grab the link from the address bar of your browser and put the link into the WayBack Machine at archive.org. You can then see what the bad link used to look like. Often the resource is still there, but the resource link has changed. If you then go to the site referenced at the root of the link, it will normally take you to the homepage of the website. Using the homepage menus you will usually be able to find the resource you had originally wanted.

In the past researchers have made attempts to create a search engine specific to genealogy that searched the "hidden web". As far as I have been able to discover none of these attempts were successful. If you know of some other means to find databases in the "hidden web", please drop me a line with the information. I am always eager to learn new ways to find genealogy resources on the web and I will include the instructions in an upcoming news email. Happy Ancestor Hunting!

Copyright © 2016 Cindy Carman. All rights reserved.
No printed reproduction of this article may be used without the express written permission of the author.
Links to this article are encouraged.

Newsletters

Select Newsletter by Issue or Topic:

Genealogy HowTo
Issue: Mar 11, 2016

The "Hidden Genealogy Web" Tips and Tricks

by Cindy Carman

You are missing some exceptional resources, if you have not been using the "hidden genealogy web". So what is the "hidden genealogy web" and how can you access it?

The term "hidden web" mainly refers to information that search engines cannot index, like databases. It has been estimated that there is 500 times more data in the "hidden web" generally than in the publicly indexed web. The "hidden web" has often been pictured as a giant iceberg with 10% above the water line and 90% below. Accessing the hidden 90% can mean a big boost to you as a genealogist, if some of those pages refer to your ancestors.

So the big question becomes: Why hasn't Google or Bing or Yahoo indexed this "hidden genealogy web", so genealogists can search this data? A better question would be: How can genealogists find data from the "hidden genealogy web" without using a search engine service?

There are several reasons a genealogy database does not get indexed by one of the Internet search-bots.

  1. Access to the database is denied to the search-bot or at least limited.
  2. The database is in a format that the search-bot can't read.
  3. The search service doesn't know about the site in order to index it.
  4. The database is password protected.

Let's look at some examples of these indexing restrictions.

The data is in a database that is denied to the search-bot or at least limited.
Some sites restrict the indexing of their databases because they find that the search-bot uses too much bandwidth and it slows their website down to a crawl. They may restrict the rate at which a search-bot can gather data. So the accesses are limited to a few accesses a minute rather than the hundreds or even thousands that would be required to fully index a site. Sometimes the website manager denies the bot altogether because the bot is gathering data from areas that the manager wants to restrict.

The database is in a format that the search-bot can't read A good example of the first, second, and third reasons a database has not been indexed is FamilySearch.org.

FamilySearch.org contains a vast amount of genealogical data and is free to search, featuring a pretty sophisticated search engine to boot. There are thousands of volunteers working practically round the clock to extract the data from the images that The Church of Jesus Christ of Latter-day Saints has collected and placed online. Only certain aspects of the FamilySearch.org website has been opened to be indexed by the search-bots. Mostly a researcher has to go to the website to do a search and obtain the family record data for their ancestor.

You may not know that FamilySearch.org has posted on their site images to hundreds more databases that have not yet been indexed. They can be found at the very bottom of the "search/collection/location/" pages for each country or location. Scroll to the bottom of this page and you will see an example of the hundreds of thousands of US database images just waiting to be indexed. You can freely browse these images. Some of the databases are not set as a high priority to get indexed, because the data, that is represented on these images, are in alphabetical order, or the image set has an index as part of the set. This imaged data is part of the "hidden web".

The database is password protected. Examples of a password protected site would of course be Ancestry.com and MyTrees.com . The search-bot is denied access, because the configuration of the web site prevents access by requiring a password. Even though MyTrees.com provides a free search of the index to its data, special indexes have to be created for the search-bots so they can make the data available to the users of their search services. Not all of the free links at MyTrees.com have been indexed by the search-bots because so much data is added everyday and the special indexes need to be rebuilt regularly to include the new data.

Usually a password protected genealogy site wants to attract people to their service through a popular search engine like Google or Bing. In order to do this, they often have to resort to creating a satellite site which is a site of a different name that provides a sampling of the data from their collections. The links that you will find at a satellite site direct you back to the password protected site. An example would be the site familyhistory.com, which is part of the Ancestry.com network of websites, or geni.com which is part of the myHeritage.com network.

So can genealogists find data in the "hidden genealogy web" without using a search engine service? The answer is YES!
There are thousands of genealogy websites that are stuffed with information that has not been indexed by any search engine. Most of these websites represent a specific locality, nationality, or genre of data.

Location, Location, Location
The most successful way to find information in the hidden web is to first identify the location where your ancestor lived. Then using one of the methods below you will be able to locate websites that have databases worth searching, that are part of the hidden web.

Methods for finding family history websites in the "hidden web"

  • Find the website of the main library for the locality where your ancestor lived. This might be a genealogy library or just a public library. Search for genealogy help on the site or locate their favorite links page.
  • Find and join a genealogical society for the locality where your ancestor lived. Search the databases they have available using your login, and ask for genealogical help. Use their favorite links page to find websites of interest.
  • Use a link aggregator site to find links to websites with searchable databases of genealogy data. My favorite is FamilySearch.org/learn/wiki/. When you have accessed the site, navigate to the locality of your ancestor to find links to websites of interest. (Here is an example of the Indiana Searchable List from FamilySearch.org and the Quebec Searchable list of Records )
  • Use Google.com to find other sites with searchable databases and other aggregate links sites. Use the search terms: online searchable records. Don't put this phrase in quotes. Then add to the search field the locality of your ancestor and one of the terms birth, death, marriage, land, military, etc.

Be aware that it is a monumental task to keep all the links functioning on an aggregate site. If you find a bad link and you really want to see the resource, grab the link from the address bar of your browser and put the link into the WayBack Machine at archive.org. You can then see what the bad link used to look like. Often the resource is still there, but the resource link has changed. If you then go to the site referenced at the root of the link, it will normally take you to the homepage of the website. Using the homepage menus you will usually be able to find the resource you had originally wanted.

In the past researchers have made attempts to create a search engine specific to genealogy that searched the "hidden web". As far as I have been able to discover none of these attempts were successful. If you know of some other means to find databases in the "hidden web", please drop me a line with the information. I am always eager to learn new ways to find genealogy resources on the web and I will include the instructions in an upcoming news email. Happy Ancestor Hunting!

Copyright © 2016 Cindy Carman. All rights reserved.
No printed reproduction of this article may be used without the express written permission of the author.
Links to this article are encouraged.

Newsletters

Select Newsletter by Issue or Topic:

Genealogy HowTo
Issue: Mar 11, 2016

The "Hidden Genealogy Web" Tips and Tricks

by Cindy Carman

You are missing some exceptional resources, if you have not been using the "hidden genealogy web". So what is the "hidden genealogy web" and how can you access it?

The term "hidden web" mainly refers to information that search engines cannot index, like databases. It has been estimated that there is 500 times more data in the "hidden web" generally than in the publicly indexed web. The "hidden web" has often been pictured as a giant iceberg with 10% above the water line and 90% below. Accessing the hidden 90% can mean a big boost to you as a genealogist, if some of those pages refer to your ancestors.

So the big question becomes: Why hasn't Google or Bing or Yahoo indexed this "hidden genealogy web", so genealogists can search this data? A better question would be: How can genealogists find data from the "hidden genealogy web" without using a search engine service?

There are several reasons a genealogy database does not get indexed by one of the Internet search-bots.

  1. Access to the database is denied to the search-bot or at least limited.
  2. The database is in a format that the search-bot can't read.
  3. The search service doesn't know about the site in order to index it.
  4. The database is password protected.

Let's look at some examples of these indexing restrictions.

The data is in a database that is denied to the search-bot or at least limited.
Some sites restrict the indexing of their databases because they find that the search-bot uses too much bandwidth and it slows their website down to a crawl. They may restrict the rate at which a search-bot can gather data. So the accesses are limited to a few accesses a minute rather than the hundreds or even thousands that would be required to fully index a site. Sometimes the website manager denies the bot altogether because the bot is gathering data from areas that the manager wants to restrict.

The database is in a format that the search-bot can't read A good example of the first, second, and third reasons a database has not been indexed is FamilySearch.org.

FamilySearch.org contains a vast amount of genealogical data and is free to search, featuring a pretty sophisticated search engine to boot. There are thousands of volunteers working practically round the clock to extract the data from the images that The Church of Jesus Christ of Latter-day Saints has collected and placed online. Only certain aspects of the FamilySearch.org website has been opened to be indexed by the search-bots. Mostly a researcher has to go to the website to do a search and obtain the family record data for their ancestor.

You may not know that FamilySearch.org has posted on their site images to hundreds more databases that have not yet been indexed. They can be found at the very bottom of the "search/collection/location/" pages for each country or location. Scroll to the bottom of this page and you will see an example of the hundreds of thousands of US database images just waiting to be indexed. You can freely browse these images. Some of the databases are not set as a high priority to get indexed, because the data, that is represented on these images, are in alphabetical order, or the image set has an index as part of the set. This imaged data is part of the "hidden web".

The database is password protected. Examples of a password protected site would of course be Ancestry.com and MyTrees.com . The search-bot is denied access, because the configuration of the web site prevents access by requiring a password. Even though MyTrees.com provides a free search of the index to its data, special indexes have to be created for the search-bots so they can make the data available to the users of their search services. Not all of the free links at MyTrees.com have been indexed by the search-bots because so much data is added everyday and the special indexes need to be rebuilt regularly to include the new data.

Usually a password protected genealogy site wants to attract people to their service through a popular search engine like Google or Bing. In order to do this, they often have to resort to creating a satellite site which is a site of a different name that provides a sampling of the data from their collections. The links that you will find at a satellite site direct you back to the password protected site. An example would be the site familyhistory.com, which is part of the Ancestry.com network of websites, or geni.com which is part of the myHeritage.com network.

So can genealogists find data in the "hidden genealogy web" without using a search engine service? The answer is YES!
There are thousands of genealogy websites that are stuffed with information that has not been indexed by any search engine. Most of these websites represent a specific locality, nationality, or genre of data.

Location, Location, Location
The most successful way to find information in the hidden web is to first identify the location where your ancestor lived. Then using one of the methods below you will be able to locate websites that have databases worth searching, that are part of the hidden web.

Methods for finding family history websites in the "hidden web"

  • Find the website of the main library for the locality where your ancestor lived. This might be a genealogy library or just a public library. Search for genealogy help on the site or locate their favorite links page.
  • Find and join a genealogical society for the locality where your ancestor lived. Search the databases they have available using your login, and ask for genealogical help. Use their favorite links page to find websites of interest.
  • Use a link aggregator site to find links to websites with searchable databases of genealogy data. My favorite is FamilySearch.org/learn/wiki/. When you have accessed the site, navigate to the locality of your ancestor to find links to websites of interest. (Here is an example of the Indiana Searchable List from FamilySearch.org and the Quebec Searchable list of Records )
  • Use Google.com to find other sites with searchable databases and other aggregate links sites. Use the search terms: online searchable records. Don't put this phrase in quotes. Then add to the search field the locality of your ancestor and one of the terms birth, death, marriage, land, military, etc.

Be aware that it is a monumental task to keep all the links functioning on an aggregate site. If you find a bad link and you really want to see the resource, grab the link from the address bar of your browser and put the link into the WayBack Machine at archive.org. You can then see what the bad link used to look like. Often the resource is still there, but the resource link has changed. If you then go to the site referenced at the root of the link, it will normally take you to the homepage of the website. Using the homepage menus you will usually be able to find the resource you had originally wanted.

In the past researchers have made attempts to create a search engine specific to genealogy that searched the "hidden web". As far as I have been able to discover none of these attempts were successful. If you know of some other means to find databases in the "hidden web", please drop me a line with the information. I am always eager to learn new ways to find genealogy resources on the web and I will include the instructions in an upcoming news email. Happy Ancestor Hunting!

Copyright © 2016 Cindy Carman. All rights reserved.
No printed reproduction of this article may be used without the express written permission of the author.
Links to this article are encouraged.

Newsletter Signup | My Account | Names Added | Site Map | Our Company
 
Affiliate | Privacy Policy | Refund Policy | Terms and Conditions
Copyright © 2017-2019 Fficiency Software, Inc. All rights reserved.