ABX Tests Prove Hi-Res Audio Is Legit!

"Champagne for my real friends, real pain for my sham friends" ― Francis Bacon

This is really great news for anyone following the hi-res audio debate. The debate is simple; some people claim there is an audible difference between high resolution files and lower resolutions, and some people claim there isn't. There have been a bunch of technical articles on the subject but up until now(ish), there's never been proof that everyone agrees upon. No solid scientific statistically significant proof as to whether or not anyone can discern hi-res audio from lower resolutions in an ABX test. The good news is we have significantly significant test results that prove without question there is an audible difference between hi-res and lower resolutions. The not so good news is this happened over a year ago.

I'm going to let you read the specifics of the test yourself over on the What's Best Forum: "Conclusive 'Proof' that higher resolution audio sounds different". Here's the gist of it. One of the co-creators of the ABX Comparator, Arny Kruger a staunch supporter of ABX testing, created a number of files from a 24-bit/96kHz recording he made of keys jingling (I know, but he's the ABX audio expert). Arny sampled down the original recoding to 44.1kHz and 32kHz, then resampled them back to the 24/96 original. Arny's contention was no one would be able to pick out the down sampled files from the original 24/96 file.

He was wrong. The founder of the What's Best Forum, Amir, loaded up the files in the Foobar ABX Comparitor software, which is the software recommended by Kruger and others for this purpose, and he took the ABX test. In the 32kHz vs 96kHz test Amir scored 13 out of 13 correct. In the 44.1kHz vs 96kHz test, Amir scored 7 for 7. Amir also took part in a similar blind ABX comparison offered by the AVS Forum where three 24/96 original tracks were down sampled to 16/44.1 then upsampled to 24/96. The ABX test involved telling which is which again using the Foobar ABX Comparitor. Amir scored three positive detections (10/11, 12/12, and 25/30).

Here's Amir on the Arny files, "I would say the 32Khz sample then is the proverbial 'night and day' to my ears and the other ones difference is in very low level detail." So we're done with this debate, right? Of course not.

If you go to the What's Best Forum, you'll see the initial post by Amir with the Arny test results dates from July 13, 2014. You'll also see that 159 pages of posts later, there's a comment from July 31, 2015. I have not read every page and I don't intend to. Suffice it to say that some people, people who don't believe there is an audible difference between hi-res files and lower resolutions, have all kinds of objections to Amir's ABX test results. Even people who recognize that there is a difference, want to quibble over how important the differences are, the fact that Amir is a trained listener so his results cannot be applicable to most un-trained listeners, and on and on and on. Remember, this was a simple test for audibility and Amir passed with flying colors.

If you make it to page 156 (I skipped), you'll also find reference to the 2014 Audio Engineering Society's award winning paper of the year titled, "The audibility of typical digital audio filters in a high fidelity playback system" by Bob Stuart, et al. (we've talked about this paper before). Here's the abstract (my emphasis):

This paper describes listening tests investigating the audibility of various filters applied in high-resolution wideband digital playback systems. Discrimination between filtered and unfiltered signals was compared directly in the same subjects using a double-blind psychophysical test. Filter responses tested were representative of anti-alias filters used in A/D (analog-to-digital) converters or mastering processes. Further tests probed the audibility of 16-bit quantization with or without a rectangular dither. Results suggest that listeners are sensitive to the small signal alterations introduced by these filters and quantization. Two main conclusions are offered: first, there exist audible signals that cannot be encoded transparently by a standard CD; and second, an audio chain used for such experiments must be capable of high-fidelity reproduction.
So we're really done now, right? I mean that's a veritable slam dunk of valid test results all showing there is an audible difference between hi-res files and lower resolutions. Pop open the sparkly and let's have a hi-res party! Not so fast.

Of course there are people who take issue with the Stuart paper including Robert Adams of Analog Devices as reported by the Boston Audio Society (BAS). Of course one of the most famous, if not the most famous, of BAS sponsored papers was by Meyer and Moran titled Audibility of a CD-Standard A/DA/A Loop Inserted into High-Resolution Audio Playback from 2007 wherein they proved people could not tell the difference between CD-quality and higher resolutions. So BAS has skin in this game. Of course there have been plenty of criticism of the BAS paper and the methods employed mostly by people who know, we can say we know this, that there is an audible difference.

What the hell is the point of talking about all of this now? For two reasons. If you look at the comments on the recent Ars Technica/JREF botched Ethernet test, you'll see that even though the test results are meaningless, people agree with the meaningless results, i.e. only one person out of seven heard a difference between Ethernet cables so no one can hear a difference between Ethernet cables.

The second reason being even when presented with valid ABX test results and peer-reviewed award winning papers, people who don't want to believe whatever the tests and papers prove will keep believing what they want to believe. The irony here is that many of the people calling for these tests are typically the most vocal when it comes to hurling insults at "believers", even when what people say they hear is backed up by science.

COMMENTS
rwjr44's picture

I would be satisfied if more people were to take this particular test and post their results. How many? I don't know! But a sample of one is too small IMHO.

Tentatively, this is good news.

Michael Lavorgna's picture
There was certainly more than one sample in the Stuart paper.

Besides, in any test for audibility, all it takes is one statistically significant result, i.e. not guessing, to prove audibility.

Bromo33333's picture

I will agree that if the comment is "There is nobody that can hear the difference" it is well and truly debunked at that point.

But I would say the reason statisticians are employed, but never quoted in high publicity ABX tests, is they would plainly state the above contention was absurd to begin with (Hmmm ... an "extraordinary claim" perhaps?), and all you can really do is say with a mathematical degree of certainty, that a certain percentage of people cannot hear a difference.

The shameless thing is that with a sufficiently sloppy experimental design, and a lot of hand waving, you might be able to fool people into thinking that somehow HiRez is bunk. Given that certain parties' reputation is on the line that if they were to prove a phenomenon that appears to be fake/bunk is indeed the real deal ... well what happens to their shining star? And given the nature of their most ardent supporters, they'd be torn to shreds. They are incentivized to "debunk" and not prove. And therefore to deceive their followers that they are doing science and making scientifically based pronoucements.

I think the main issue with ABX testing is that is can never definitively prove anything. All it can do is describe a degree of certainty for a certain percentage of the population.

Michael Lavorgna's picture
...was no one can hear a difference between these files (the Arny files). Someone did. End of story ;-)

ABX testing in audio has become a tool for bullying, nothing more. Ars/Randi have been kind enough to prove this point.

DH's picture

AFAIR, AES papers aren't peer reviewed; that was one of the problems with the Meyer/Moran article - no one pointed out the flaws to them before they published their conclusions. BTW, on of the flaws is that they used "hi-res" (SACD) that had been produced from upsampled Redbook - so there wasn't a reason it should sound better than CD.

As far as Amir's results and the other results: what they show, at a minimum, is that at least some individuals can hear the difference. That's very different than claiming that the difference is inaudible.

Michael Lavorgna's picture
I've edited the text to reflect this.
As far as Amir's results and the other results: what they show, at a minimum, is that at least some individuals can hear the difference. That's very different than claiming that the difference is inaudible.
Could you expand on this? Thanks.
Bromo33333's picture

"AFAIR, AES papers aren't peer reviewed"

Every 'call for papers' from the AES I have seen indicates the submissions will be peer reviewed. It would be extraordinary if they weren't considering they have academic weight.

But ... I don't follow them too closely, so who knows if they do it for any of their publications.

(And Mike, I agree, the claim is well and truly debunked at that point, but I guess I was saying when you assert something that is mathematically unprovable, you get what you deserve, especially when you try to beat people about the head with it)

Michael Lavorgna's picture
Thanks.
DH's picture

That the papers are peer reviewed. But then the "peers" didn't see some obvious flaws in the Meyer/Moran article that a lot of audiophiles picked out immediately.

Michael Lavorgna's picture
The Stuart paper I linked to is a "Conference Paper". I do not believe it was published in the JAES Journal which is a peer-reviewed publication. That said, even Convention Papers are reviewed by "peers" and the fact that the Stuart paper was selected as the paper of the year for 2014, means that it was reviewed positively ;-)

But my understanding is it in incorrect to call it a peer-reviewed paper.

Bromo33333's picture

Back in the olden days when I was a grad student, I had to give papers in the IEEE Conferences, but my Professor made for most of the arrangements - I think the subject and abstract was approved, but your presentation wasn't. I wonder if that's the same? I bet you are right and a Conference paper isn't quite the same as a "peer reviewed" journal article or letter.

But a paper of the year is a pretty big honor. Even if not technically peer reviewed, it means the academics present thought it important enough to vote for it! Cool!

Perhaps I'm just getting lost in details. I see my newly arrived copy of "Amused to Death" is here. Maybe I'll have a listen! :o)

Really wanting to lay hands on a DSD DAC these days ...

T-Sporer's picture

Your statement was correct in the past, but some years ago AES changed the policy for Convention Papers:
There are now three classes:
- peer reviewed
At least two (often more) reviewer are involved. Reviewers are nominated by the papers co-chairs. There is quite a number of papers _not_ accepted, some of these are "down-graded" to "abstract/precis".
- abstract/precis reviewed
This used to be the "normal procedure" for AES Conventions.
At least two reviewer are involved, reviewers are nominated by the papers co-chairs, there is quite a number of papers _not_ accepted, some of them are "down-graded" to "engineering brief"
- engineering brief
Reviewed by papers co-chair only. Usually work in progress.

The Stuart paper is among the peer reviewed papers. The decision of the award is a joint effort of papers co-chairs and the editors of the Journal.

BTW: On AES conferences papers are usually peer reviewed. JAES papers are always peer reviewed (and often several revisions before accepted).

Michael Lavorgna's picture
It is very much appreciated.
DH's picture

Michael -

What I was trying to say is that we are always being told that it is "impossible" for there to be an audible difference between hi-res and Redbook. Then they do "testing" on a group of individuals and find that none of them can hear the difference: Conclusion is then that "no one" can hear the difference. But this is from a group of people who probably haven't actually learned what to listen for.

My position is this: if they took a group of known trained listeners, they would find that at least some of them can consistently pick out the difference.

So the more accurate thing to say might be that the "average" listener can't hear the difference, but certain individuals can. That's very different than saying there is no audible difference at all.

Michael Lavorgna's picture
I wasn't sure what you were saying but now I am. And I agree.

What I always come back to, and I know I've beaten this to a pulp, is the best way to determine if you (I don't me you, I mean anyone) can hear a difference is to listen for yourself, over time. Some people do not believe this to be a rigorous enough "test" and require further proof. Which is fine by me. Here, I would suggest that unless the person requiring the proof takes the test them self, they'll never know for certain if they in fact can hear a difference or not. Depending on what the test is testing, a further requirement may very well be that the listening test take place in the tester's room with their own system.

I'll also add that the most important question that needs to be answered, do you prefer A over B and if there is a price difference between them is it worth it, can only be answered by you.

Bromo33333's picture

The main difficulty with this kind of testing is you are trying to prove a negative that in the final analysis you can only build confidence in.

You can say "With 95% certainty, less than 1% of the people can hear X" when you have 298 subjects that fail to hear X.

If you want to be more stringent still, you would have to test around 4600 subjects who fail to hear X to say that with a 99% certainty, less than 0.1% of the people can hear X.

If you have, say, 20 subjects, and all fail to hear X, you can only say one of the following:
1. You are 99% certain that less than 20% of the people can hear X
2. You are 88-89% certain than less than 10% of the people can hear X

Which won't be good enough for most who want a "black/white" answer.

You guys are correct that 1 result of hearing X will make the "I am 100% certain that 0% of the people can hear X" but they were never on a trajectory to achieve it anyway.

The people with the fuss and thunder trying to make a point to the general public, only muddy the waters because they are terrified of admitting that they are only playing the odds, and aren't and indeed are incapable of backing up that claim by statistics itself.

cliffjumper68's picture

I DJ events for friends and family (weddings ect.) and I slip in low res versions of popular songs among my regular high res files. I use good equipment but not even remotely as good as my home system. It never fails people will ask me, is your equipment OK? why did the song sound flat? can you fix it? I replay the high res version and everyone smiles and assumes it was a technical error. So, every time I read people say there is no audible difference I laugh and think of these pranks. Never get frustrated with ignorant people writing ignorant things, it is all they have.

Michael Lavorgna's picture
...isn't fair ;-)

Cheers.

DavePeerless's picture

That’s GREAT!
So using a touch of logic, it should also prove that ABX tests are a VALID way to compere audio components. For example: The $5.00 interconnect vs. the $500 interconnect vs. the $5,000 interconnect.
OR,
The $300 preamp, vs the $3,000 preamp, vs the $30,000 preamp.
Just a little food for thought.

Michael Lavorgna's picture
Here's what I said there (and here): ABX tests for audio are a great way to confirm what you already know. Other than that, they're useless (wink emoticon).

Cheers.

Hubris2's picture

There are so many holes in what has been said here. Firstly, multiple people with "good" ears should go through the tests that Amir did. Then a few questions rather than one should be asked, such as which file (A or B) has more detail, which one sounds better... Much of preference is just that, preference. That's absolutely cool, but fidelity and preference are frequently conflated.

prerich45's picture

One person....scoring perfectly?!!! Nope - that's enough to say that differences are audible to a portion of the population. That's enough for Hi-Res to live IMHO. The object in the test was to discern that the files are different - that's one of the major debates - that there is no audible difference. That has been clearly debunked by Amir alone...which also gives rise that there are others as well that could tell that there's a difference.

Hubris2's picture

Depending on which thread you read and on which site, you will notice a few different aspects of the argument at hand. That is to say, can someone hear the difference between audio files, which resolution provides a better listening experience, and how rigorously were the tests conducted.

Think of it this way. If you have two tomatoes, one engineered to withstand the g force involved in lifting a spacecraft from the planetary surface, and the other a nice red heirloom which bruises easily, but offers immense flavor, would you ask only if one is different from the other, or would you ask the person to describe the difference?

As far as proof, the discussion above is intriguing and part of me hopes that it proves correct (particularly because it will further justify the high definition discs in my collection. However, as much as I want to accept the experience of one man in this area, there are two problems. 1) Testing one individual is not proof of a phenomena in the general population. This relates to the discussion of sampling above. Experiments must be reproducible and done repeatedly (or with a large sample). 2) Amir may be a real exception, or he may be hearing artifacts of processing, rather than differences in the fidelity possible at different resolutions. If the test was done well and with high integrity, it seems likely that Amir has exceptional ears.

Another problem is that people continue to confuse mere difference between audio files with varying levels of fidelity. Preference for the way one audio file, or format, or component sounds in no way guarantees that the preferred option provides better fidelity. Fidelity is only about accurate representation of the original source, whether live or synthesized, or from some intermediary stage of mixing. Most arguments suffer without this acknowledgement.

cliffjumper68's picture

In the scientific method you have confidences not absolutes. So it is easy to tell when supposed "scientists" go off the reservation when they say things like all, always, 100%, ect. Think of past "scientific findings" that came close to absolutes like: all cholesterol is bad, eggs are bad, salt is bad, carbon dioxide is a pollutant, biological organisms can only survive in a narrow temperature range, and more. Each of these are spoken with such authority by some big to do working at some big to do organization, yet in only years begin the qualifications as obvious errors creep out. At least types of cholesterol are needed for healthy function, egg cholesterol and protein are beneficial, a diet to low in sodium causes cognitive disorders, and carbon dioxide is needed for photosyntheses and plankton growth. So something that was a absolute becomes a qualified confidence. My point is why would you trust some big to do over simply listening for yourself. It is accessible and low risk so why not? Always be a skeptic especially of statements that use absolutes in there pronouncements.

prerich45's picture

Since someone can tell the difference - HI-Res is worth producing as there will be others that can quantify this difference as well! Bravo Amir!!!! For people to say that it is inaudible is tantamount to bullying indeed!!!!

Clever Dean's picture

to validation of differences in audio reproduction?

Michael Lavorgna's picture
...it would be great if you could point to some references. Thanks!
JorgeJesus's picture

Please someone tell Amir that he can win 1 million Dollars with JREF cables test..lol

Clever Dean's picture

if given the choice people wouldn't load up on higher resolution files where they are from the same master.

Storage is cheap.

CG's picture

Well, that's a great question.

From the comments I've read at various places, the main reason I can see is that people prefer quantity over quality. They'd much rather carry around 10000 tracks of lower quality in their listening device than 1000 higher quality tracks.

I suspect part of that is that they never really focus on the music, either by choice or circumstance. To each their own.

I also suspect that in the present age, people are highly conscious of "value" - getting the best bang for their buck. Based on comments I've read, again, there is a great sensitivity to being "ripped off" by artists and recording companies. But, hey - everybody needs to make their choices on this sort of thing.

Along those lines, MY big question is : If you like something, why do you (generic "you") care what other people's opinion is? They aren't listening for you! Purchase decisions should be between you and people who are directly affected, like people who live with you and people who share your financial resources. Otherwise, who else should matter? If you prefer high resolution recordings, go for it! If you can't tell the difference or it just plain doesn't matter to you, be happy with that. I don't think your recording preference is likely to be indecent, immoral, or illegal.

ChipotleCoyote's picture

Price can end up being kind of a big deal. It's not too unusual to find the same album available for $17.98 at HDTracks, $10.99-$12.99 at iTunes, and $8.99-10.99 for the actual CD at Amazon. (If you're a Prime customer, the latter is still virtually instant gratification in most cases, since they'll give you a free MP3 download to hold you over until the CD arrives.) If we're comparing the latter two, I might well be paying twice as much for the HDTracks' version. The 96/24 albums that I've bought sound really good, to be sure, but it's enough of a price difference to make me pause for a while.

Having said that, this is often a theoretical dilemma, unfortunately -- I just don't come across albums I love in hi-res format that often. The practical answer to your question might be "we're too rarely given that choice."

cliffjumper68's picture

Only one finding is all that is needed to destroy a absolute statement. You can say in all but one, or even 99.999999999% of Humans, but never all or 100% again. Amir your the man! You just singlehandedly destroyed the statements of hundreds of audiophile hating bloggers/jurno's with your one finding!

mtymous1's picture

...subjects should undergo frequency testing to determine his/her highest frequency range. Once you have enough data for a valid sample size, THEN you can formulate a hypothesis and conduct ABX on sample files.

(Why isn't there any data about the subjects and their audible frequency ranges??)

Either way, I've already concluded that it's *OKAY* if you don't notice a difference when playing hi-rez. It suggests one of three scenarios:

A. Your equipment sucks.
B. Your hearing sucks.
C. Both your equipment and hearing suck.

Simple as that. ;-)

It's just like that hidden dinosaurs picture: you either recognize them, or you don't. Either way, it's an initial indicator of how sharp your senses are/aren't.

Archimago's picture

Hello Michael,
I'd be cautious with the interpretation of Amir's results.

Remember, what he's using to listen for those tests:
- HP ZBook 14 laptop
- internal ZBook DAC and headphone jack!
- default Windows audio stack / mixer with Windows dithering! (not even ASIO/WASAPI so it's not "bit perfect")
- Etymotic ER4P - decent, but only single driver balanced armature design headphones (I have these and know the limitations of the sound)

Unless he tested differently elsewhere, I think most of us would appreciate just how compromised of a system this is. Whether he performed the testing properly or not, I do not know, but I really do question the ability of the hardware to reproduce 24-bits and whether they are capable of accurate high-frequency response.

For all we know, the ABX'ed difference could be distortions introduced by the laptop/DAC/Windows (highly probable since he acknowledged his hearing "is shot above 12kHz")... Who even knows if the HP laptop DAC is even capable of >16-bit audio output or whether the internal DAC even handles 96kHz natively, or if he even had the "DTS Studio Sound" feature activated or turned off.

As for the Meridian paper. I highly suggest getting a copy of the full paper and seeing for yourself if the text actually supports what is claimed in the abstract.

Michael Lavorgna's picture
Thanks for your comments. Re. Amir's results, I have read about the possibility of IM distortion being an explanation for Amir's ability to easily pass these tests. This would certainly be an interesting avenue to explore seeing as it is device specific, at least that's my understanding.

Other people using different gear have also passed these same two tests with perfect scores. I had an exchange with someone on FB over this exact issue and he was one of others to pass these tests. I believe his results are on AVS forum but a quick search returned too many results.

Re. the Bob Stuart paper, I've read a lot about it, including comments from Bob Stuart, but I have not read the entire paper. Are you saying the results do not support the claims made in the abstract?

Here's a post from Amir on the Stuart paper which he references here:

"We see that five (5) out of six (6) listening tests comfortably cleared the 95% interval both in their mean and standard deviation. Oddly, the 24-bit, 48 Khz sampling did not do so fully when you account for the error range. A further look at the results focusing on critical/more revealing music segments dealt with that by bringing that score above 95%.

So what do we learn? All processing was distinguished from the original to statistics confidence! Filtering with or without conversion to 16 was audibly different to listeners (to statistical confidence).

One of the main points of my post was to show that each of these tests, including the Meyer & Moran, are criticized and refuted for various reasons. I am unaware of any test results in audio that aren't.

My personal take on this subject also takes into account the hundreds/thousands? of people who design hi-res capable equipment, who record in hi-res, who offer hi-res downloads, and finally the people who listen and enjoy hi-res recordings. Coupled with the above test results, as well as additional research by people like Bob Stuart (and MQA) for one easy example, the evidence seems overwhelmingly in favor of, yea there's a difference.

Archimago's picture

Hello again Michael.

Yes, I would certainly be curious to see what equipment the other person ABXed with and passed!

Regarding the Stuart/Meridian paper, indeed it is not peer reviewed, and in fact, at the top of the reprint sits this disclaimer:
"The complete manuscript was not peer reviewed. This convention
paper has been reproduced from the author's advance manuscript without editing, corrections, or consideration by the
Review Board. The AES takes no responsibility for the contents."

If you look through the full text what we witness is essentially a test of a steep low-pass filter using Adobe Audition 6; it is linear phase. Furthermore, for 4/6 of the trials, the authors subject the filtered 24/192 signal to 16-bit quantization (undithered!) or 16-bit quantization with RECTANGULAR dither which the authors even admit is not "best practice"!

It is with this 16-bit quantization that the majority of the "significance" was found! In fact, if you look at the 2 trials of JUST the un-quantized signal (ie. JUST the filtering), the result was equivocal - the 22kHz filter (to emulate CD-quality 44kHz brick wall) was significant with p<0.05, but the 24kHz filter (to emulate DVD-quality 48kHz) was NOT significant.

Remember, this is with a steep linear filter with transition band of ~500Hz... I'm really not sure how this is representative of "typical" DAC performance. I would say that certainly the use of rectangular dither is NOT typical.

The bottom line is that I think this paper is a misnomer. There really is only 1 filter used (not "filters"; essentially the same filter with different cut-offs). They confound the effect of the low pass filter with 16-bit quantization. Further worsening things with a suboptimal dithering algorithm (rectangular). As such, I also wonder why they describe this as "typical" in the title!

One final note. Just because people can ABX or find a difference does not of course mean "better". As you noted, intermodulation distortion can be heard if severe enough with the hardware (hence my issue with Amir's gear!). And intermodulation is worse with high-res/high-samplerate audio. Neither Amir nor the Stuart/Meridian paper address whether they thought the high-resolution sample actually sounded better or more "real". Ultimately, isn't that what we're after? Remember, in blind testing it is possible that people might actually end up preferring the sound of downsampled material with less risk of intermodulation; assuming a reasonable filter setting is used.

For the record, I have stated on my blog that I like 24/96; so I'm not speaking as someone who is rigidly in the 44/48kHz camp.

Michael Lavorgna's picture
The difficulty is there are so many pages of comments and any search I tried returned too many results to sift through.

Robert Adams of Analog Devices also brought up the filter issue as not being representative of typical DAC performance which certainly seems to limit the applicability of the study. I'd like to hear Bob Stuart's thoughts on this...

"Just because people can ABX or find a difference does not of course mean 'better'."
Yes, of course.
"Neither Amir nor the Stuart/Meridian paper address whether they thought the high-resolution sample actually sounded better or more 'real'. Ultimately, isn't that what we're after?"
Yes, imo, that is the entire point. In my experience, what sounds more "real" is listener dependent.

Years ago, I wrote about visiting people's homes to listen to their hi-fi's with them. I was not at all interested in my reaction to their hi-fi in a judgmental way, rather I was trying to understand how they relate to the hobby and how listening to music 'fits' into their life. To say the systems I experienced varied wildly would be a fair statement.

For the record, I have stated on my blog that I like 24/96; so I'm not speaking as someone who is rigidly in the 44/48kHz camp.
Are there any tests related to 24/96 v 44/48kHz that you consider of value?
Archimago's picture

Actually, all the tests are of value within reason! As much as I have concerns about the Stuart/Meridian paper, it is not completely inconsistent with the literature that I have seen on the matter...

The bottom line is that whether you're looking at a paper like the Stuart/Meridian one, or Pras (2010, comparison of 44 vs. 88kHz), or Meyer & Moran, or even my own 24-bit vs. 16-bit test over the internet, what we see is that the *effect size* appears small. Sometimes the data reaches significance like with the Stuart/Meridian, other times clearly it does not.

For example, if you look at the Meridian test, overall accuracy was 56.25% with 160 trials. And this is with the suboptimal dithering, and 16-bit quantization thrown in on top of the low-pass steep filter! Sure, we can say statistically this is "significant", but does it really matter? All we can say is that obviously, the original 24/192 signal didn't sound *that* different from all the other DSP'ed samples that the testers were only accurate by 6.25% more than chance!

My thinking around wanting 24/96 is that this bit-depth and sampling rate would be more than what the research suggests would be the temporal and dynamic resolution of the ear/brain complex. As a "perfectionist audiophile" it essentially assures me that there's really nothing else missing based on research into psychoacoustics. I'd be happy to pay a *little* more for 24/96 for my favourite albums (concomitant with the small potential for audible effect) for this sense of reassurance if I felt the recording was of adequate quality. [The poor quality of recording for most pop/rock disqualifies most of this genre - acoustic jazz and classical are about the only genres worthy of high-res IMO.]

Practically, with a Nyquist frequency at 48kHz, it doesn't matter about the filter settings any more... Any ringing and temporal smearing will be beyond audible range. Furthermore, with today's DACs, 24/96 is easily achieved accurately in objective testing.

That's my take in a nutshell...

Michael Lavorgna's picture
I understand why this is obviously an important metric. I do wonder, however, in those instances where someone involved in a given test reliably differentiates between A and B, while the majority cannot, if from a listener's perspective it is of greater importance that there is in fact a difference.

For example, with your "Linear vs. Minimum Phase Upsampling Filters Test", I listened to the files and had a clear preference that was easily detected for two of the three samples but the overall test results showed the differences were not significant (I hope that's a fair summation).

So, from one perspective, the overall results have value. From my perspective as a listener, they do not. The same applies to any listening test where there are mixed results. At least that's the way I look at it.

Our hobby represents a very clear tiny minority as compared to the number of people who listen to music. Any listening test that does not include experienced listeners or fails to exclude people who cannot reliably tell A from B when there is a proven difference, has little relevance outside of the subjects being tested.

In terms of the question to hi-res or not, I prefer the option of buying an album in its original format. I see no valid reason to take a 24/96 original, for example, and only offer a 16/44.1 (or lossy) version. For analog recordings, it also makes no sense to me to only offer a 16/44.1 version. As you point out, research supports the use of higher resolutions than CD-quality.

Of course the quality of the original recording is the second most important factor following behind the quality of the music itself.

DH's picture

Actually, if you read Amir's comments about the music files he listened to in the test, he clearly preferred the hi-res files and could hear that they sounded better.

Clever Dean's picture

I need to get this straight. The million dollar cock up was done on a laptop and because of that it's a problem.

But

Another AB/X test shows that there is a difference in High Resolution files is also done on a laptop?

Going to send my Irony meter back in for calibration. It seems to be wildly sweeping back and forth at a rather quick pace.

myrantz's picture

It's more an anti-climax. All the hype and in the end only 2 actually did the AB-X. And all we get is a "one got it correct and the other not".

The one who got it correct deserves the 1 million dollars. He may well win that amount at the tables if he's really that lucky.

Michael Lavorgna's picture
With any test, we need to look at its premise. With the "Arny test" the premise was no one can hear a difference between these files. Someone did.

With the Ars/Randi test, the premise was I heard a difference between Ethernet cables, can other people hear a difference in an ABX test. The cock-ups were numerous as we all know and included using gear that clearly was not up to the task of refuting what I heard in my system.

Reed's picture

Get people interested in " high-resolution wideband digital playback systems" so they can hear the difference.

Clever Dean's picture

is that you only need a laptop with it's built in HP out for high res audio but you need something else entirely for distinguishing Ethernet cables :-)

AllanMarcus's picture

How often do people listen to ringing keys thought their systems? Maybe there something about ringing keys that accentuates hi Rez vs regular Rez. Seems an odd choice to test with. Why not, you know, music?

Michael Lavorgna's picture
...used music :-)
drblank's picture

Here's why.

When I buy a digital file, I can buy the CD in 16/44.1 or a 24 bit file and it might be anywhere between 44.1 to 192kHz (for the most part) for sake of argument, I'm just limiting to just PCM vs PCM instead of dragging in DSD files into the discussion.

Now, I have no idea where they originally got the source for the 16/44.1 vs the 24 bit version. Sometimes the 24 Bit version is from a DSD archive, DVD-A, sometimes it's from a new conversion from analog that's different than the 16 Bit version, etc. etc. etc. No one tells us anything and I think we are entitled to know as much information as possible so we actually know what we are buying.

This test takes a 24/96 file, down samples it and then up samples it. That's the not the real world with the files we have to choose from. When I buy 16/44.1 it's not been down sampled and up sampled. It's either converted from Analog, from the original 16 bit master, or maybe down sampled from 24 Bit original masters. I have read that downsampling from 24 to 16 bit is better than taking 16 bit and up sampling to 24 Bit. In this test, they do both and THEN they have people compare. But I can't buy a single 16 Bit CD that's been down sampled and then up sampled from and to 24 Bit unless I up sample on my own system and probably have to use a filter to get rid of pre-ringing artifacts.

Why did they not do an ABX test of a 24/96 and a 16/44.1 that were both converted from the analog version and leave the downsampling and up sampling out of the equation and just figure out a way to not look at whether the file being played is 16 or 24 bit.

The other thing that frustrating is we have no idea what other things are being done with these different "remastered" versions.

Example. I bought a remastered version of an old early 70's analog recording in CD format and this remastered version was released a few years back. It does sound noticeably different than the original CD and I can definitely tell they changed some EQ, what else, I have no idea what else they did during the remastered CD. Now, I just noticed HD Tracks has a 24/96 version and they said it was released 2012, but it just hit the market within the last day or two in 24/96. HUH. So how was this released in 2012 as per the liner notes on HD Tracks???????? I have no idea if it's an up sampled version of the 16/44.1 bit version or if it was a conversion from DSD archive, DVD-A with potentially a different EQ, etc. They don't give us any mastering information as to what they did during each mastering, they don't tell us whether it was up sampled, down sampled, converted from DSD archives, or DVD-A, etc. etc.

So, the ONLY ABX test that's relevant is between two different versions of recordings that we have to choose from and not some specific test one person did, which may prove to be where the downsampling and up sampling changes the files enough to notice a difference. I already know that up sampling has issues and that using an apodizing filter or something similar might make it sound better because of pre ring during the up sampling.

So, I think this isn't a real world comparison and that we need to know what and how these different versions really are and they should tell us whether they are up sampled, down sampled, DSD to PCM conversions, digital conversions from analog, if there are EQ settings that are different from the previous release, added or deleted audio compression, limiting, etc.

Anyone care to add more clarity if I'm wrong on any of this???

T-Sporer's picture

In real world the recording often is done in 96/24, and remastered to 44.1/16 for CD. You are right: In real world up-sampling would not be an explicit step.
However (1) with all Sigma-Delta converter involved there is always upsampling happening.
However (2) doing the comparison test with different reproduction chains (different DA, LP filters, ...) adds a lot of uncertainty. Using Down-sampling and Up-sampling together with the same (analog) equipment at the reproduction side reduces the factors causing differences:
1 sampling rate
2 requantisation (this happens even if staying with floating point) and correct dithering
3 shape of anti-aliasing filters
4 non-linear distortions in the analog chain

Stuart did a good job by setting up the experiment. He evaluated (1) and (2), but did not use proper dithering (triangular highpass dither might do a much better job - see Lipshitz/Vanderkoy, AES papers from the 90ties). He did not consider optimizing the shape of the anti-aliasing filter 3. Inadequate filters cause clearly audible effects in the time domain - see for instance van Maanen's AES paper "Temporal Decay: A Useful Tool for the Characterization of Resolution of Audio Systems?" (1993). We have seen several times that frequency components above 20kHz are folded down by non-linear components (4) in the amplifier or loudspeaker and which cause audible effects, which make the sound "more rich" (so people prefer this!). However this preference is similar to the preference of (analog) distortion caused by turn-tables - they are there, but they do not belong to the recording....

Another issue concerning best practice:
AB-X is a great tool if you want to compare different non-perfect systems, but it is not the right way to judge whether a transmission is "not transparent": A transmission is not transparent if there is at least one person who can distinguish reliably. To test this this one person must be able to repeat correct decisions "sufficiently often". A test which can be used there is a repeated paired-comparison test: Pairs of stimuli are presented to the test subject and the subject has to answer the question "same or different?". For each audio sample the pair can be RR,RT,TR or TT (R=hiqh res reference, T=modified version). With 10 repeats a person with at least 8 correct is a candidate for "difference is audible for this person". Due to the fact that 9 correct answers while guessing for one person is already 1% (and 30% if you used 20 person!) this test should be repeated with the candidate for another 10 trial.
BTW: Kimio Hamasaki did such a test some years ago and found two candidates (and one could repeat this test shortly after, but not one year later - an age of 19 seemed to be the limit ...)

arnyk@wowway.com's picture

Nice job of totally one-sided reporting, Mike. If you want to make an attempt at what most people call journalism, you now know where to find me.

Michael Lavorgna's picture
It seems I found you right here ;-)

It makes sense to me for you to tell your side right here. This way people reading this article can also read your story.

Cheers.

arnyk@wowway.com's picture

It makes money for you and increases your influence, for me to tell my story here, Mike. Read your story again if you can't figure out why I don't want to do much of that.

The short answer is that Amir's handling of the tests on his WBF forum suggests to many that he doesn't get science, and if he doesn't get science then there is not a lot to do to help him. In short, one or three test results if uncontested would still prove nothing. Amir's results are not uncontested.

If Amir wanted to do Science, he would have provided additional evidence when requested over a year ago.

Where are the others who have duplicated his results, given that all of the resources for duplicating the test are out on public forums?

Ditto for the Meridian AES paper. Its got a lot of problems which your link exposes very nicely, thank you. I hope that it will be followed up on.

Meanwhile, the posts to this thread convince me that a lot if not the majority of people see through this little charade, and that's good enough for me.

Michael Lavorgna's picture
"It makes money for you..."

While I can see how you could make the argument that your participation here may draw additional readers, thus increasing traffic to the site, you would have to take into account the actual numbers before you converted that influence into cash. Based on reality, what you are suggesting is a fairy tale.

"...little charade..." Nice!

DH's picture

If Arny starts showing up here regularly, you may lose readers....

CG's picture

Best comment this year!

Reading arguments and debates takes too much useful time out of life. Thanks for reminding me of this, since I have been complicit as well.

I'm out. Enjoy your listening time, whether low res, high res, or just singing in the shower.

DH's picture

Arny, I've read your forum extensively, and one-sided would be a perfect phrase to describe it, if I was limited to one phrase.

You regularly engage in casting aspersions on anyone who doesn't agree with your position. And you regularly assume others are motivated only by greed, and post unsupported negative comments about them and their motivations/practices/products - without any real knowledge of them or their products - all based on assumption.

Michael is much more fair minded and closer to a "journalist" than you will ever be. And BTW, he's employed to state his opinion. Yes, we want some fairness and even-handedness - but this isn't an academic journal page.

arnyk@wowway.com's picture

Please name my forum and give its URL, because I never knew I had one!

I am guilty as charged because that forum exists only in your mind, and the whole post is about things that are just as imaginary as my alleged forum.

LOL!

mytek's picture

Q: "The main difficulty with this kind of testing is you are trying to prove a negative that in the final analysis you can only build confidence in.

()

If you want to be more stringent still, you would have to test around 4600 subjects who fail to hear X to say that with a 99% certainty, less than 0.1% of the people can hear X."

So, wait, there are two scenarios: one that 4599 people heard the difference but one person didn't in which case it's easy, that person is deaf, or two, that 4599 did not hear but one person did.

I do listening tests almost daily and appreciate very much doing it with people who can hear, a new person always brings new perspective. I often"learn to hear" something I have missed earlier.

I have come across some extraordinary ears in multiple dimensions, some people have more musical or more technical approaches. I don't believe we all hear exactly the same. No way.

Best Regards

Michal at Mytek New York

cundare's picture

I've done a # of ad hoc tests myself over the years, using a far less sophisticated method of double-blinding: I simply create duplicate files at various resolutions, name them cryptically, then play them with a digital player that displays only filenames. After I record my test results, I look at the same files on a PC, where I can easily tell the resolution of each one by its file size. Works for me.

But my point is this: There have been times when I could not hear any difference among files that ranged from 44/16 through 192/24. But there have been times when I did hear clearly perceptible (as in 90%+ correct identifications), but only as a function of bit depth, not sample rate. That is, I heard a greater difference between 44/16 and 44/24 than I did between 96/24 & 192/24.

So my question is: Why are we concentrating on discerning differences between sample rates (or between formats in which both sample rate & bit depth vary), but almost never isolate bit depth as a parameter that, when varied, yields audible differences?

On its face, higher bit depths might indeed make a digital file sound more "vinyl-like." The enormous increase in dynamic range going from 16-bit to 24-bit surely brings a digital file more in line with the continuous amplitude variations of a record. (And yes, these comments are informed by the fact that dynamic range does not merely represent a format's maximum "loudness" -- the difference between maximum amplitude and the noise floor.)

ActorCam's picture

Forgive me if this has already been addressed, but what is the point of up-sampling back to 24/96 in this test? Why not down-sample and compare those files against the original? What's the purpose of performing yet another conversion, doesn't that risk introducing more distortion?

arnyk@wowway.com's picture

The files were upsampled to make them more likely to be compared properly. Some DACs pause, and others click and pop when you change their sample rate. In some cases people can ABX files based on glitches like that. In the world of science, upsampling from one sonically transparent format to another can be sonically transparent, that is be undetectible.

Michael Lavorgna's picture
Normally "ABX" is not allowed to be mentioned here, I think it's Rule #666, but in your case I'll make an exception ;-)

Cheers.

arnyk@wowway.com's picture

Amir made deceptive comments about my involvement in this fiasco. What really happened is that I started asking leading questions about the hardware platform he did his tests on. This is critical because in this kind of test there is a long track record of positive results based on nonlinearities in the playback system. His test platform was as I recall the built in audio interface in a laptop. These can go either way, and you should test them to ensure that the audio subsystem of the laptop is linear enough. I've done these tests and a fair number of laptops and portable dacs and headphone amps have fallen on their face. Amir's response was to make some misleading comments about me, so I think he also knows why he obtained such overwhelming positive results.

X