• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

July DAB

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

ChasR

Senior Member
Joined
Apr 12, 2004
Location
Atlanta
There have been no posts in the DAB since April except a few about the overvalued p8023, which quickly got corrected.

I posted today a question for PG. Basically asking which does the most science, 2 x 2600k, 15 x C2Qs (3.0 GHz and >) and 17 GPUs (G92 and >) or a single 4P 6174 machine. Maybe I'll get some news to report as a result.
 
There have been no posts in the DAB since April except a few about the overvalued p8023, which quickly got corrected.

I posted today a question for PG. Basically asking which does the most science, 2 x 2600k, 15 x C2Qs (3.0 GHz and >) and 17 GPUs (G92 and >) or a single 4P 6174 machine. Maybe I'll get some news to report as a result.

There you go stirring the pot! I approve :thup:


Something seriously needs to be done, I don't think anyone would mind a complete point revamp when v7 is released out of beta. I think timing it with v7 being marked official would make a nice "fresh start" to the program. I don't think anyone existing points should be removed but if we can go back to the way things were when SMP first came on the scene everyone would be for the better. The whole problem really started with SMP and GPU clients. SMP was given the bonus for being a beta program. Alright that's fine. Next GPU was given an even larger bonus, and that's where the problem happened. Beta projects should get a bonus, and QRB is a good idea. Everything just needs to be linear not exponential as you've pointed out many times. QRB gives a bonus for those of use who run 24/7, but it shouldn't be the drastic bonuses we see today.

Really points need to be based on FLOPS. Each unit should be valued not based on a benchmark machine but based on the FLOPS it is expected to take. A system with higher FLOPS obviously gets more ppd, however it's strictly linear. I'm sure there is holes in this plan, but it makes the most sense. Maybe small bonuses for QRB and a "priority bonus". Priority bonus being a small % increase for a given platform during times when it's needed. For example smp, uniclinet, gpu and bigadv all simulate different types and length of protein folding by their nature. So if they need more small gpu unit crunched for a month or so they could activate this predetermined bonus to shift people short term.

The problem as you always point out is no one like change, and it won't be easy after people drop 2k on a 4P for them to get revalued. It's a vicious cycle though because people are always jumping on the new bandwagon.
 
One day I should write a history of FAH points inflation. I didn't start at the beginning though, so I'd be a poor historian, but I can go back 8 years and still have tons of collected data.

Beta projects aren't supposed to get a bonus. It generally happens when a mistake gets made in benchmarking or in shortcutting the benchmark (equal pay for equal work when the work isn't really equal) or, in the case of BA and some other older Wus, benchmarking on a machine with a cache bottleneck that isn't present in more modern or faster machines than the benchmark machine. It makes it appear there is a beta bonus, but it's really an accident or carelessness.

I've never liked the QRB and never will. I've learned to live with it, but propose it be made linear and capped at a much smaller percentage than it is presently every chance I get.

At one time the relationship of the client benchmarks was as follows:

uniprocessor: 110 ppd on a P4c at 2.8 GHz
PS3: 900 ppd
GPU2: 1500 ppd on a HD3850
SMP; 1760 ppd on a 2 x Woodcrest 5140 @ 2.33 (slightly slower than a Q6600)

The uniprocessor and PS3 benches haven't changed. THe GPU bench went by the wayside when nVidia gpus started folding and were 5x faster than the ATi benchmark machine. And look where SMP has gone. I see my q6600s that made 3600 ppd on SMP making up to 12,000 ppd on SMP2.

Bed time.
 
Lol... love the smiley cuda! :D

Although I am among those enjoying the 4P ppd. I certainly support this effort to even the playing field. Uniproc work being way undervalued, in comparison to SMP2 and GPU2+, and BA work being way overvalued due to the non-linear nature of the bonus calculations. I think the bonus formula is where the problem lies.* I have no issue with the BA16 class of WUs. I do believe those who have invested in 16+ thread machines should be rewarded for our investment. The latest WUs have gotten things much closer. I'm now making 300k ppd in lieu of almost 500k. I already call that quite a nice adjustment. PG screwed up with the initial valuation of ALL BA work... and taking most BA away from less than 16 threads and decreasing the ppd of current BA16 work is a step in the right direction. These things have to move slow... otherwise, if PG just cuts BA16 ppd to 100k or less then people will probably just stop running the work; and they certainly don't want that.
 
There are a lot of things to consider. If you can make 300,000 ppd on a single 4P machine using 600 W, why would somebody like me continue to fold on C2Qs making 170,000 ppd on 3750 W while having the computing power of at least 2 4P 6174 machines. As I see it, there needs to be a closer relationship of BA to normal smp work. However the genie is out of the bottle and probably can't be put back in.

Nnormal -smp 48 should make say 20% more ppd than 48 uniprocessor cores and BA at -smp 48 should make 20% more than normal smp at -smp 48. You can pick the % and pick the number of cores that % applies to. Pick lower than 48 cores and the QRB will skew the relationship, but perhaps it should.

As I've said before in this subforum, a new MP benchmark machine is needed. and a complete reset of the values and QRB bonus along with the new machine. Without it, we're all going to be dropping three digits and adding a K when we talk about ppd. Ahh, I see we already are doing that, without doing anywhere even near 100, much less 1000 times the science. (One 6174 core is approximately equal to a P4E (640) @ 3.6 GHz)

All the cpu WUs need to be benched on the new machine and the relationship set. PG can pick the performance point on the scale they want to match, though I doubt my suggestion that they go back to the 110 ppd on a P4C @ 2.8 GHz will fly. I never quit trying to protect the value of yesterdays points.

Every body that spent bucks to get on the BA bandwagon needs to enjoy the big points while they can. In the not too distant future, the QRB will make your machines almost as obsolete as mine.
 
There are a lot of things to consider. If you can make 300,000 ppd on a single 4P machine using 600 W, why would somebody like me continue to fold on C2Qs making 170,000 ppd on 3750 W while having the computing power of at least 2 4P 6174 machines. As I see it, there needs to be a closer relationship of BA to normal smp work. However the genie is out of the bottle and probably can't be put back in. <snip>

This is why I think they need to get rid of the benching machine altogether and just do a benchmark on each machine (possibly for each core), determine your system's FLOPS and assign points based on the estimate FLOP/s a wu needs. The GROMACS core already records and reports the FLOP/s. It wouldn't be hard for stanford to determine a FLOP/s rating on each WU, in place of the benchmark system currently in use. Then more or less your total base ppd would be equal to the ppd of the 2 4p systems. You draw about 2-3 times the wattage so there is still much to be gained by using 4p for cost reasons. Stanford can still implement a QRB or bonus scheme for beta projects or such. However in terms of base ppd all machines would be equal.

I know FLOP/s don't actually mean anything, according to most, but I'm sure there is something else we can use that most would qualify as meaningful. GROMACS spits out quite a bit of info after a wu finishes. The problem is currently things are benched on a set machine and the points are determined based on how long it takes that machine to complete the wu. When a processor of a different architecture works the WU it's rewarded for how much more efficient it can be. The problem as you point out is the benchmark machine quickly become old. By changing from time based to FLOP/s (or anything set benchmark) we eliminate this problem of the benchmark machines falling behind. Instead the benchmark machine just determines the FLOP/s required to process the WU. That can then be used to give a point value. This allows for the points to be adjusted as new generational hardware comes out without having to upgrade the benchmark machine.

Also this provides a better way to determine a linear QRB. They can take the FLOP/s reported from machines in the community to determine the preferred deadline. Doing that eliminates the issue of old hardware on the benchmark machine. It also solves the generational problem form the community. The deadlines will get shorter and shorter but slowly overtime as majority move to newer hardware and reported higher FLOP/s.
 
As I understand things, Boinc does something similar to what you describe. Also as I understand, optimized Boinc clients don't actually do more work, they just optimize the brnchmark, in effect lying about the speed of the machine. Elimination of that possibility would be a must.

I believe PG is moving in the direction of an individual machine benchmark using a snippet of a WU run on client startup. However, the schemes discussed use this bench solely as an assignment tool to replace core count for BA assignment.

The current benchmark system biggest failing is that PG has always employed a mid level machine that is almost obsolete the day it is deployed. If a top end machine is used, the benchmark results will stay valid much longer. Per core performance hasn't changed nearly as much as cores/cpu.
 
THinking about your proposal, Shel, I think it has merrit. However it still has the same benchmarking problems as the current system. If you use an i5 to determine the flops required to run BA work, it will yeild a very large number of flops, but the reality is most of them did nothing because the cpu is cache bound on BA work. You still have to have a very powerful benchmark machine that isn't bottlenecked by anything on any WU.

We've had a lively discussion about the relationship of uniprocessor, vanilla smp and BA work. THe question arose, what should the relationship be? Using p8101, the worst performing BA WU as a comparisson on my 2P, I found that vanilla smp provides a 1800% increase over uniprocessor work, while BA provides a 70% bonus over vanilla smp. THis relationship is maintained on a 4P (M-C chips do better on uniprocessor than Intels w/HT or IL chips) So what should it be?
 
Back