This downloads the text of the article the GRM3 gene from the mobile version of wikipedia. The pages in wikipedia are extremely complex, with many links to other pages on the site and elsewhere. The depth is set to 1 so that these links will not be followed. However, because of inline and formatting instructions, one also needs to restrict the download further. The inclusion string MAIN means that only the main text page will be downloaded. The exclusion strings .php and android were used for earlier attempts to prevent extraneous links being followed. Unfortunately, these specifications do not result in images being obtained, but at least one has the text.
http://en.m.wikipedia.org/wiki/GRM3 GRM3 article /GRM3/S/article/index.html /GRM3/S/article/index.html/shortened34/wiki/GRM3/index.html 1 0 100 1 1 MAIN 2 .php android
Back to example webset definitions