on Google Blogsearch
Thursday, September 15th, 2005 06:32 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
All of this compiled from various panics, Q&A and tech talk elsewhere. I don't like the idea of my content being searchable without my specific say-so, either.
0. Blogsearch is still in beta, does not work as final version will work, and only contains data fromMay-August this year May 2005 onwards.
1. Via a friend with good contacts. "Google knows about the problem with indexing blogs that are marked as "noindex" and they're really sorry. They were leftover from the original test phase and not removed when it went into beta."
2. It's unclear (to me, anyway) whether the blogsearch thing is indexing the actual journal or the 'hypothetical' RSS feed that exists, by default, for every LiveJournal journal or community. (EDIT: Google implies the latter: "The goal of Blog Search is to include every blog that publishes a site feed (either RSS or Atom)." Please note that this does not mean they are lying when they say they don't index LJs with the 'noindex' option ticked. Different source, different destination.)
2a. If the former, they are ignoring the "Block Robots/Spiders from indexing your journal" option on the user info page. (This is what stops your LJ from appearing in standard Google search results, unless of course you don't have it ticked.)
2b. If the latter, you can change your syndication options via the console.
- To set for your own journal, set synlevel level, where level is title | summary | full
- To set for a community, set for communityName synlevel level, where communityName is the name of a community for which you are a maintainer, and level is as above.
gacked from here, where there is a lot of useful info.
3. People are now talking about locked posts being indexed. Haven't seen this and can't replicate. Any examples? Are these posts that have always been locked, or might they have been scraped while unlocked?
0. Blogsearch is still in beta, does not work as final version will work, and only contains data from
1. Via a friend with good contacts. "Google knows about the problem with indexing blogs that are marked as "noindex" and they're really sorry. They were leftover from the original test phase and not removed when it went into beta."
2. It's unclear (to me, anyway) whether the blogsearch thing is indexing the actual journal or the 'hypothetical' RSS feed that exists, by default, for every LiveJournal journal or community. (EDIT: Google implies the latter: "The goal of Blog Search is to include every blog that publishes a site feed (either RSS or Atom)." Please note that this does not mean they are lying when they say they don't index LJs with the 'noindex' option ticked. Different source, different destination.)
2a. If the former, they are ignoring the "Block Robots/Spiders from indexing your journal" option on the user info page. (This is what stops your LJ from appearing in standard Google search results, unless of course you don't have it ticked.)
2b. If the latter, you can change your syndication options via the console.
- To set for your own journal, set synlevel level, where level is title | summary | full
- To set for a community, set for communityName synlevel level, where communityName is the name of a community for which you are a maintainer, and level is as above.
gacked from here, where there is a lot of useful info.
3. People are now talking about locked posts being indexed. Haven't seen this and can't replicate. Any examples? Are these posts that have always been locked, or might they have been scraped while unlocked?