Welcome to the Office Mac Help Site About | Blog | Links | Glossary | Feedback | Downloads | Help

BASICS OF SPOTLIGHT

by Andy Ruff MacBU Program Management Entourage

Written for Entourage 2004

For more information on using Spotlight with Entourage 2008 see this article.

Entourage 2008 & Spotlight

Spotlight is a blazing fast search engine integrated directly into your Mac. It's a great tool and the ability to search all your Entourage items within Spotlight makes it, for me, absolutely "mission critical" to each day's activities.

The recent release of Entourage includes ample documentation regarding how to use Spotlight with Entourage, but it provides very little note of how the whole process works. The following will be rather detailed, but I hope that it will provide a good starting point for understanding why we designed the feature the way we did and how you might go about troubleshooting problems that may arise.

BASICS OF SPOTLIGHT

Understanding how Spotlight generally works is a prerequisite for grasping how Entourage hooks into the Spotlight system. For our concerns, Spotlight has three primary roles: watching for changes to a file, importing metadata for a file, and allowing users to search all this metadata. Metadata is essentially all the essential information within a file. For songs, this usually includes the title, the album, or the genre (all those fields you see in iTunes). For e-mail messages, this includes the subject, the date sent, the Entourage categories, and many other properties of your e-mail message.

When a file is modified on your computer, Spotlight is notified that such a change occurred. Spotlight keeps track of all these changes and will, at some point, begin the process of extracting metadata from the modified files. The time frame for which it begins to extract metadata may vary. Apple cleverly designed Spotlight to only extract metadata at times in which it wouldn't interfere with your ongoing use of the computer.

Once the Mac OS does kick-off the extraction of metadata from a file, it does so through a Spotlight Importer. Spotlight Importers are plug-ins for the Mac OS that a developer provides specifically for helping files created by their applications to be searchable within Spotlight. Spotlight crawls through its list of changed files, handing each one to the appropriate importer. The importers then read the files, compile a list of metadata, and then hand the metadata back to Spotlight. At this point, the changed file is available for searching within Spotlight.

For example, there's a Spotlight Importer for music files such as an MP3s(System/Library/Spotlight/Audio.mdimporter). When Spotlight is ready to extract metadata from a MP3, Spotlights asks the music importer to read the MP3 and provide and information about MP3 to be searchable within Spotlight. The music importer quickly reads the music file and responds with info like "Title = Ain't Life Grand" and "Author = Widespread Panic." After all this was done, you would be able to search for "Widespread" in Spotlight and the corresponding MP3 file would show up in results.

WHAT TOOK SO LONG?

When we announced our Spotlight feature at MacWorld this year, noticed two types of reactions among attendees: a giant smile followed by an "awesome" or grumbled stare preceded by "about time" or "what took so long?"

The challenge of supporting Spotlight for Entourage primarily centered on the way Spotlight, at its fundamentals, is designed. Spotlight is designed around changes in files, not databases. I'm not knocking Apple's design Entourage now demonstrates, there are very viable, simple solutions for Mac applications facing this obstacle.

If you recall, Spotlight uses the change of a file to determine when it needs to get new metadata for the search index. Entourage, however, stores all information within a single database file (you can find yours inside Documents/Microsoft User Data/Office 2004 Identities). Each time you get a new message, create a contact, or delete an event, that single database file is changed.

As with all files, Spotlight receives a notification that the Entourage database file has changed. We could have written an importer that opened up the database file, searched your database for the changes, and then provided all the new metadata to Spotlight. In fact, we tried this experiment and it incredibly taxing on your machines performance and was heavily prone to causing problems.

Another option would have been to move away from the single file database. This is, in fact, what Apple chose to do with both Address Book, Mail, and iCal in Mac OS 10.4. This is a long-held, sometimes heated discussion amongst members of the Entourage team. There are merits and challenges to having a single file database or splitting into a series of files for each item within the database. We made the decision to stay with our current database structure primarily as such a change, at this time, would prove highly disruptive and risky. Remember, the goal here was to provide Spotlight searching to users, not re-architect the way Entourage stores data.

HOW DOES ENTOURAGE WORK WITH SPOTLIGHT?

When you enable Spotlight indexing within Entourage, a "cache" file is created for each item within your Entourage database. If you have 100,000 e-mail messages in your Entourage database, 100,000 cache files will be created. If you want to see the cache files, you can find them within your Library/Caches/Metadata/Microsoft folder.

Each cache file contains all the metadata that will be needed for indexing by Spotlight. All changes within Entourage are reflected to the cache files. Create a new item and a new cache file will be created. Updated an item and its cache file will update. Delete an item and its cache file will be deleted. With all these changes, Spotlight receives file change notifications and eventually will ask the modified cache files to go through the import process using the Entourage Spotlight Importer.

When you first turn on Spotlight in Entourage, this may take some time. Entourage has to crawl through your entire database, reading each item, and creating the corresponding cache file. For a moderate sized database (50,000 items), the process typically takes 20 minutes though there are many factors that cause this time to vary. Once the first set of cache files is created, Entourage will update cache files almost instantaneously. Typically, delays in Entourage items showing up in Spotlight results is due to Spotlight waiting for idle time to index the modified cache files. If you do a "Rebuild" within the Entourage Spotlight preferences, Entourage will simply delete the previous cache files and kick-off the crawling process that regenerates all the Entourage cache files.

As Spotlight begins the import process, each cache file is handed one-by-one to the Entourage Spotlight Importer. The importer reads the cache file, and just as the music importer did with MP3s, provides all the relevant metadata to Spotlight for searching. Once the importer provides this data, the item is searchable via Spotlight. Again, keep in mind that Spotlight determines when this indexing happens. Delays in Entourage items showing up in Spotlight results most likely mean Spotlight has not yet indexed the item.

We chose to be very liberal in the amount of metadata produced and provided by Entourage's cache files. Not only did we try to align our metadata with that of the equivalent Apple applications (e.g. where Apple Mail defined a property as an e-mail subject, we used the same property), but we also pumped out a lot more information such as categories and projects. Essentially, our goal was to provide all properties of items accessible via our AppleScript Dictionary as metadata properties for Spotlight searching. What this means is, you can use Spotlight to do very powerful, fast queries against nearly all data within your Entourage database. Try this: type the name of one of your categories in the Mac OS Spotlight search field you'll see that all Entourage items within that category show up!

This approach ends up being a very simple way to provide Spotlight searching for Entourage. It's downside is that it consumes a bit more disk space as it essentially mirrors some contents of your Entourage database you've now got both the Entourage database file and the associated cache files. We looked at this for some time, investigating how much additional disk space an average user would need and how often the two sources of Entourage data would fall out of sync. We found that for a large number of users, the additional disk consumption was relatively small (typically less than 15% of the original database size) and with performance efforts on our part the risk of getting the two out of sync rather minimal.

SUMMARY

Spotlight's a very cool addition to the Mac OS it's amazingly fast, very powerful, and incredibly handy. The Entourage team is excited to finally get to share with you our efforts on supporting Spotlight for Entourage. The project was an interesting technical challenge for us, but we hope that we've provided a solution that makes searching Entourage items with Spotlight a part of your daily routine.

-- Andy Ruff MacBU Program Management

Entourage Weblog: http://blogs.msdn.com/entourage/