Oh man, I have been struggling with this very question for a while now, still don't have much of an answer though. Figure I would share some of my thoughts in hopes of some good dialog / idea exchanges.
Currently the "system" we use on Crankfire is pretty much based on what the single user that entered the trail entry thinks is appropriate. Which is more or less ski area classifications (plus a "pay to play" and "evil" rating). I am not a super fan of this approach. Granted people can comment away and
generally rate each place 1 to 10 as a whole, it is all still a way too subjective and vague for me.
So I have 2.5, maybe almost 3 plans so far:
1) I recently came upon IMBA's trail difficulty rating guide:
http://www.imba.com/resources/trail_building/itn_17_4_trail_difficulty.html
Which I think is an excellent starting point. I was thinking to allow every user to put in a vote using these criteria and then tally these up on the trail page - something like 75% of people voted this trail has "black diamond" type stuff within (or whatever).
2) Similar to the above approach, use multiple basic 1 to 10 ratings on several factors, for example: Technicality, Hilliness, Stuntiness, Distance-ness, etc..
2.5) A combination of 1 or 2 and jamming some crunched data from gps tracks uploaded. In theory, I calculate average grades and elevation changes, that should count for something? Probably not though, my math is poor.
3) "Tags" or a "Tag Cloud" approach. Give users a bunch of options to choose from and calculate whats going on with that. For example: Roots, Rocky, Hills, Singletrack, Epic, Doubletrack, Scenic, Facilities, etc.. If enough people "voted", a tag cloud could be generated? We kinda wrote the tag cloud thing off though, forgot why....
4) The craziest idea was to try to implement a netflix type deal, where one could calculate somethign along the lines of "Users like you rated this trail..." I have no friggin clue how to do that, and I imagine you would need a pretty sizable dataset to pull that one off.
Issues: 1 and 2 are still really subjective, and 3 and 4 would both require large datasets to extract anything statistically meaningful.
I think that pretty much everything that has been shaking around in my head.