[personal profile] tamaranth
All of this compiled from various panics, Q&A and tech talk elsewhere. I don't like the idea of my content being searchable without my specific say-so, either.

0. Blogsearch is still in beta, does not work as final version will work, and only contains data from May-August this year May 2005 onwards.

1. Via a friend with good contacts. "Google knows about the problem with indexing blogs that are marked as "noindex" and they're really sorry. They were leftover from the original test phase and not removed when it went into beta."

2. It's unclear (to me, anyway) whether the blogsearch thing is indexing the actual journal or the 'hypothetical' RSS feed that exists, by default, for every LiveJournal journal or community. (EDIT: Google implies the latter: "The goal of Blog Search is to include every blog that publishes a site feed (either RSS or Atom)." Please note that this does not mean they are lying when they say they don't index LJs with the 'noindex' option ticked. Different source, different destination.)

2a. If the former, they are ignoring the "Block Robots/Spiders from indexing your journal" option on the user info page. (This is what stops your LJ from appearing in standard Google search results, unless of course you don't have it ticked.)

2b. If the latter, you can change your syndication options via the console.
- To set for your own journal, set synlevel level, where level is title | summary | full
- To set for a community, set for communityName synlevel level, where communityName is the name of a community for which you are a maintainer, and level is as above.
gacked from here, where there is a lot of useful info.

3. People are now talking about locked posts being indexed. Haven't seen this and can't replicate. Any examples? Are these posts that have always been locked, or might they have been scraped while unlocked?

Date: Thursday, September 15th, 2005 05:56 pm (UTC)
From: [identity profile] dougs.livejournal.com
From some minimal research:

0. Blogsearch [currently] includes data from 39 minutes ago, but not data from 12 minutes ago.

Date: Thursday, September 15th, 2005 06:19 pm (UTC)
From: [identity profile] marypcb.livejournal.com
A five minute index interval is what Technorati et al aim for...

Date: Thursday, September 15th, 2005 06:19 pm (UTC)
From: [identity profile] purpletigron.livejournal.com
OK, I'm not quite keeping with the programme here...

the 'hypothetical' RSS feed that exists, by default, for every LiveJournal journal or community.

Huh?

the latter: "The goal of Blog Search is to include every blog that publishes a site feed (either RSS or Atom)."
...
If the latter, you can change your syndication options via the console.
- To set for your own journal, set synlevel level, where level is title | summary | full


So would I be advised to 'set my syndication options' even though I don't (want to) syndicate my journal?

Date: Thursday, September 15th, 2005 06:34 pm (UTC)
From: [identity profile] tamaranth.livejournal.com
Your LJ has an RSS feed by default. (http://livejournal.com/users/purpletigron/data/rss). If you want to control the amount of information that is shown on that feed (which should only ever contain public posts) -- then yes, you should set your syndication options. I'd suggest Title only.

And now I see Avi's said just the same below ... :)

Date: Thursday, September 15th, 2005 07:02 pm (UTC)
From: [identity profile] purpletigron.livejournal.com
Why have I only just found out about this? I thought I read through the small print!! Would there be any way of finding out if the syndication feed is being used (I guess not)?

I've gone for Title.

Date: Thursday, September 15th, 2005 07:46 pm (UTC)
From: [identity profile] del-c.livejournal.com
It seems to me to be Livejournal's goof rather than Google's, and Livejournal could make amends by allowing a fourth synlevel: none.

Date: Thursday, September 15th, 2005 07:51 pm (UTC)
From: [identity profile] tamaranth.livejournal.com
Why have I only just found out about this?
At a guess, the same reason as I only found out about it recently :) it's something that's been introduced since you joined -- e.g. wasn't in the small print then -- and you missed the 'notification' (probably in lj_news or somewhere), or didn't realise its ramifications.

Date: Thursday, September 15th, 2005 07:52 pm (UTC)
From: [identity profile] tamaranth.livejournal.com
I don't know of any way of finding whether it's being used. (I wonder if there's an RSS search engine that isn't blogsearch?)

Date: Thursday, September 15th, 2005 09:37 pm (UTC)
From: [identity profile] marypcb.livejournal.com
Nearly a dozen -search for my name and blog search on the Guardian and you'll get links and explanations of what they do. I wanted to cover Google but they didn't want to talk.

Date: Thursday, September 15th, 2005 10:30 pm (UTC)
From: [identity profile] tamaranth.livejournal.com
I may have missed something -- it's late! -- but do any of them enable you to find out if Real People are looking at an RSS feed, as opposed to it simply existing and being indexed? I don't mean anything as precise as a hit-count, but something that would indicate whether someone has my LJ RSS on some sort of aggregator.
[Assume I am completely ignorant of RSS stuff, and you won't go far wrong. Splendid in principle, time-consuming and haven't-got-round-to-it in practice!]

Date: Thursday, September 15th, 2005 11:21 pm (UTC)
From: [identity profile] marypcb.livejournal.com
this is how you tell if you here access to the server that hosts the RSS feed
http://ask-leo.com/is_there_a_way_to_track_unique_subscribers_to_my_rss_feed.html

something like http://www.feedburner.com gives you more info. But as LJ doesn't seem to expose the logs we couldn't tell from that. I'm going to ask the blogsearch folks I know if they get those details for LJ the way they do for individual blog sites. how many people read a feed is one of the values they use to see if a blog goes higher in a list of search results.

Date: Saturday, September 17th, 2005 07:30 pm (UTC)
From: [identity profile] d-floorlandmine.livejournal.com
I wanted to cover Google but they didn't want to talk.
Poke them with sticks!

Date: Thursday, September 15th, 2005 09:35 pm (UTC)
From: [identity profile] flickgc.livejournal.com
Why have I only just found out about this?

It's been in place for a couple of years, but probably wasn't when you signed up.

OTOH, I don't *think* that it is explictly stated when you sign up, now.

There is no way of finding out whether anyone is signed up to it.

Date: Thursday, September 15th, 2005 06:24 pm (UTC)
From: [identity profile] avirr.livejournal.com
LJ just sees the syndication feed as another way to format your journal. They figured your friends would sign up, instead of refreshing their spages. http://livejournal.com/users/avirr/data/rss is an automatic link to my journal formatted as RSS (Real Simple Syndication). If Google hadn't mucked up testing the robots tags, it wouldn't have been a problem.

Date: Thursday, September 15th, 2005 06:44 pm (UTC)
From: [identity profile] d-floorlandmine.livejournal.com
Cheers for the summary!
So, if I've got this right, LJ creates a full RSS feed as default, even if that hasn't been actually selected by the user. Right, time to fettle that "synlevel" variable.

Date: Thursday, September 15th, 2005 07:47 pm (UTC)
From: [identity profile] tvillingar.livejournal.com
I'm sorry but I'm dumb enough not to understand how 2b works: I write set synlevel and my journal name and it tells me nope, that's not it. Help?

Date: Thursday, September 15th, 2005 07:49 pm (UTC)
From: [identity profile] tamaranth.livejournal.com
it will automatically do it for the journal you're logged in as. The third bit is the level you want to set. Try SET SYNLEVEL TITLE and your RSS feed will only have the titles of your (public) posts, none of the content.

Date: Thursday, September 15th, 2005 07:58 pm (UTC)
From: [identity profile] tvillingar.livejournal.com
Got it, thank you! I thought I was supposed to write my journal title...

Date: Monday, September 19th, 2005 07:19 am (UTC)

June 2025

S M T W T F S
1 2 3 4 5 67
8 9 10 11121314
15161718192021
22232425262728
2930     

Most Popular Tags

Expand Cut Tags

No cut tags