Value of brulosophy exbeeriments, others experience, myths and beliefs

Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum

Help Support Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
Good, now we're getting somewhere. Do you think it's possible that by not making sure that readers know how to correctly interpret the P-values (which most do not), the result is that many of those readers take the words "... indicating participants in this xBmt were unable to reliably distinguish..." as an indication that there's likely no difference? If not, why not?

They often validate tests themselves and, in general, agree with the panels findings. Id trust those perceptions, knowing there should be a difference, more than someone on a forum suggesting you'll make "paint thinner" if you dont ferment lager at 35 degrees. Its happened. .

Each experiment is different, so you have to weigh them independently. How the results are used is up to the individual. I can only go back to my own brewing results, and weigh it with other peoples comments. And personal results.

At the very least, they do the experiments and publish results in an open manner. Nobody else does it.
 
@Nubiwan, do you think it's possible that by not making sure that readers know how to correctly interpret the P-values (which most do not), the result is that many of those readers take the words "... indicating participants in this xBmt were unable to reliably distinguish..." as an indication that there's likely no difference? If not, why not?

Yes or No would be good for starters.
 
@VikeMan do you think the folks over at brulosophy are intentionally trying to fleece people wit p-values, or help them in their brewing process? I believe its the latter, and its interesting.
 
@VikeMan do you think the folks over at brulosophy are intentionally trying to fleece people wit p-values, or help them in their brewing process?

No, I think the P-values are probably honest. I think they are misleading people (intentionally or unintentionally) by not explaining, in the experimental writeup, what the P-value means (and doesn't mean). Instead, they use these words " ... indicating participants in this xBmt were unable to reliably distinguish..." even when the P-values (if they were to be understood by the reader) clearly show that it was more likely than not a difference was detected.

Consider, theoretically, 100 experiments where the P-Value = 0.10. Every one of those would get the words " ... indicating participants in this xBmt were unable to reliably distinguish..."
But statistically, there is an overwhelming probability that approximately 90 out of those 100 experimental results were the result of a difference being detected.
Let that sink in.

Yet a lot of readers are left with the impression that the experiments showed that there was no difference. We know that they do, because of the many forum threads where folks cite the results as proof of something not making any difference. And it's perfectly reasonable for them to think that, given the language used, and the language not used.

Do I think Brulosophy is trying to help people in their brewing process? Possibly, but they could be a lot more helpful by making the meaning of the results clear.

Given all of the above, do you think it's possible that readers tend to keep coming back for "validation" that lots of stuff makes no apparent difference?
 
There are plenty of skeptics that will tell you why their experiments are wrong/garbage/useless but I've never seen the skeptics offer a better designed experiment that they ran themselves.
Maybe those skeptics are aware that you can't really devise meaningful, effective experiments using simple homebrew equipment but unlike this guy they don't have a vested (possibly commercial) interest in pretending that you actually can...
 
@VikeMan did you wake up on wrong side of Monday?

I'm sure Marshal and company have discussed this language with numerous experts (and a lot of armchair experts AKA homebrewers) though there are likely some rare cases of overlap in those two communities) in the field and are satisfied the "reliably distinguish" is reasonably descriptive. Have listened to the discussion on more than one podcast. But I still found you underlined thought experiment intersting.

So I went to the current front page of the exbeeriments the 50 most recent exbeeriments are listed. Of these 30 had p values greater than 0.05. Of these 30 exbeeriments that I assume have the "not able to reliably distinguish" only 3 had P values that calculated between 0.10 and 0.05. Now the two of these I checked (one was done by a club member not a regular contributor) I saw the brewer did the test 10 times and need to guess right 7 times out of 10 to hit 0.05 p value. In these two cases he was right 6/10 times for a p value of 0.08. Reading into the further comments neither time did the author say "well I thought I had them figured out but just missed on that last sample"... Nope in both cases the author admitted the beers were virtually indistinguishable.

What I get out of brulosophy is that taste and smell are blunt instruments. Lots of times the beers are visually different (today's exbeeriment eg) or measurably different in final gravity or attenuation, but still not easily distinguished in opaque cups. It's probably a good thing that our brains work this way or else we might all starve to death rejecting food that doesn't exactly match our expectations. Makes sense that restaurants put so much effort into plating the food....
 
My perspective is that they are clearly beginning to struggle and grope for relevant topics, and they are hitting upon topics with less and less relevance and more an more outright desperation based irrelevance as time marches forward. As a for profit organization, that is no way to run a business.
 
My perspective is that they are clearly beginning to struggle and grope for relevant topics, and they are hitting upon topics with less and less relevance and more an more outright desperation based irrelevance as time marches forward.
well, probably because brewing is a fairly simple process that we are all hell bent on making complicated.

Just like burgers...there are only so many different topping combos...but in the end it's just a freaking burger
 
No, I think the P-values are probably honest. I think they are misleading people (intentionally or unintentionally) by not explaining, in the experimental writeup, what the P-value means (and doesn't mean). Instead, they use these words " ... indicating participants in this xBmt were unable to reliably distinguish..." even when the P-values (if they were to be understood by the reader) clearly show that it was more likely than not a difference was detected.

Consider, theoretically, 100 experiments where the P-Value = 0.10. Every one of those would get the words " ... indicating participants in this xBmt were unable to reliably distinguish..."
But statistically, there is an overwhelming probability that approximately 90 out of those 100 experimental results were the result of a difference being detected.
Let that sink in.

Yet a lot of readers are left with the impression that the experiments showed that there was no difference. We know that they do, because of the many forum threads where folks cite the results as proof of something not making any difference. And it's perfectly reasonable for them to think that, given the language used, and the language not used.

Do I think Brulosophy is trying to help people in their brewing process? Possibly, but they could be a lot more helpful by making the meaning of the results clear.

Given all of the above, do you think it's possible that readers tend to keep coming back for "validation" that lots of stuff makes no apparent difference?
Well, when i read the experiment reviews, i pay no mind to p-values, its foreign gibberish to me. i simply look at the number that are testing, and how many can tell a difference. Then i look at of those who can tell a differene, which beer they prefer. Ultimately, the number narrows down to tell me that, even when there is a detectable difference, some still prefer the non traditional approach, which starts me wondering, whats the big fuss? When the brulosphy guys have a difficult time telling the difference, then i am sorry, but i think thats pretty telling.

In the end, who is making the big fuss over Brulosphy here?
 
I was a black belt in engineering management and a stint as a six-sigma blackbelt(think statistical modeling) before I retired. That may give me an ad

I use Brulosophy alot. Like Homebrewtalk, it's helped me improve my beers. Yes, it isn't perfect, but who else is actually doing experiments with some controls and statistical significance. He needs our support for that.

I have may have some practical advantage over some at looking at the data due to understanding P values, lurking variables, process control, how to set up experiments and the like, but as long as one looks at them as testing over a narrow range of conditions, one can learn alot. Having brewed since 1981, I have been surprised by how many things in books and literature we have taken as fact and reposted that have been debunked over the years. The folks over at Brulosophy have contributed to some of those.
 
I don't love the site, I don't hate the site. I nothing the site.

I don't follow the referenced site because too often I looked under the hood of some controversial xBs and found the designs fatally flawed and for way too obvious reasons. At the end of an almost 50 year engineering career I have no patience for ersatz science, and won't contribute the page clicks.

Cheers!
 
I'm sure Marshal and company have discussed this language with numerous experts (and a lot of armchair experts AKA homebrewers) though there are likely some rare cases of overlap in those two communities) in the field and are satisfied the "reliably distinguish" is reasonably descriptive. Have listened to the discussion on more than one podcast.

I have no doubt they have discussed it. But, honest answer, if you would: If you knew nothing of statistics, and you read the words "... indicating participants in this xBmt were unable to reliably distinguish...", what would that mean to you?

Well, when i read the experiment reviews, i pay no mind to p-values, its foreign gibberish to me.

You're making my argument for me. When readers don't understand P-values, but read the words that I won't C&P yet again, many of them will take those words to mean "Brulosophy found that there's no difference." That is the issue and it's the only issue I've raised in this thread. It's terrific if you can personally ignore those standard words. But as the "Brulosophy debunked X" claims from many threads attest, many people don't. It would be very easy for Brulosophy to fix that by adding a single sentence to each write-up.

In the end, who is making the big fuss over Brulosphy here?

Perhaps the guy who resurrected a zombie thread that had died in 2018?
 
Last edited:
Meanwhile the only actual science found is on the TMB site, everything else is pseudoscience and praised. Sounds about right.

One can certainly go "all in" on the science. Please don't be disappointed when others take different approaches.

Others may decide that
https://brulosophy.com/2015/09/24/be-a-homebrewer-an-open-letter-from-denny-conn/ said:
The best beer possible with the least effort possible while having the most fun possible.
is appropriate for them.

And others may pick only two of those three ("best beer with least effort", ...).

Or maybe home brewing is "just" a pleasant way to spend a couple of hours of 'free' time 'cooking' something enjoyable.

Or maybe it's "just" an activity where one has complete control over the entire process (including growing the grains, growing hops, collecting wild yeast, mining the ore to make the kettle, ...)

Or maybe it's ...
 
I have no doubt they have discussed it. But, honest answer, if you would: If you knew nothing of statistics, and you read the words "... indicating participants in this xBmt were unable to reliably distinguish...", what would that mean to you?



You're making my argument for me. When readers don't understand P-values, but read the words that I won't C&P yet again, many of them will take those words to mean "Brulosophy found that there's no difference." That is the issue and it's the only issue I've raised in this thread. It's terrific if you can personally ignore those standard words. But as the "Brulosophy debunked X" claims from many threads attest, many people don't. It would be very easy for Brulosophy to fix that by adding a single sentence to each write-up.



Perhaps the guy who resurrected a zombie thread that had died in 2018?
I didnt realize we had forum policing. You clearly have an agenda. You have provided nothing but negative in this entire thread. It is noted how you never questioned other comments around P-values earlier. Just my own posts. Same on other threads. Wierd!
 
Last edited:
You clearly have an agenda.

I do. The truth, to the best of my ability to determine it and share it. And learn it from others, of course.

You have provided nothing but negative in this entire thread. It is noted how you never questioned other comments around P-values earlier. Just my own posts.

You mean the messages in this thread from 2018 and prior? No, I didn't comment on the very old posts in this thread, which honestly, I did not take the time to read. And you are not the only one I responded to in this thread.

Same on other threads.

Untrue. There have been many threads where Brulosophy was brought up, and wherever I see that methods or statistics are misunderstood or misrepresented, I comment. Just like I would with any topic and any poster.

Of course, none of this changes any of the basic facts about the topic. Perhaps we could limit our debate to facts, rather than ad hominem attacks.
 
Last edited:
I have no doubt they have discussed it. But, honest answer, if you would: If you knew nothing of statistics, and you read the words "... indicating participants in this xBmt were unable to reliably distinguish...", what would that mean to you?

Hard for me to answer that question honestly. I do know a bit about statistics. Not a lot but a few classes in college many years ago now. Also work in an industry (Pharma) where p value gets used quite a bit. To me the language offered is reasonably consistent with how I see the term used in my industry. Failing to reach statistical significance doesn’t mean a studied for example drug had no effect, but it does suggest the effect was not significant enough to be seen in the study as designed. In Pharma there can be good money made showing a small effect is real and so studies are often designed to be huge expensive ordeals.

But this isn’t Pharma, it’s not even commercial macro beer, this is home brewed beer. If neither me nor my friends and family can tell beer made with or without the test variable is different I think it’s ok to question the relative importance of the test variable. And I’m the most unreliable tester available due to confirmation bias. I thought long and hard about this modification to my process. Might of spent quite a bit to make it happen. Of course I’m gonna see the impact. Any tester blinded to the variable is going to be a more valid test than I can ever be.
 
My perspective is that they are clearly beginning to struggle and grope for relevant topics, and they are hitting upon topics with less and less relevance and more an more outright desperation based irrelevance as time marches forward. As a for profit organization, that is no way to run a business.

I would note that during the pandemic, Brulosophy has just be redoing prior experiments.

A few years ago I reached out to them on their contact page and Marshal got back to me in an hour or two. We exchanged a few emails and he was very helpful.

Personally, I look at Brulosophy as a starting place for ideas to try out in my own brewery.
 
I'll be honest here, I don't have a clue what they start talking about when they get to the p-value and statistical significance part. I just look to see if people could detect a difference, and if they could not. It's also a fun read, and their recipes have gotten me thinking before, which is good.

I like them. I think they're a good thing on the internet where some people still recommend boiling and fermenting beer on the grain like frigging neanderthals.
 
I'll be honest here, I don't have a clue what they start talking about when they get to the p-value and statistical significance part. I just look to see if people could detect a difference, and if they could not.

If you don't know (and don't want to know, which is fine) about P-values, at least keep in mind that in a triangle test, if there's no detectable difference, the expected number of correct selections is 1/3 (not 1/2) of the total. If more than a third "got it right," turn on your critical thinking and try not to be too unduly swayed by the standard blurb.
 
If you don't know (and don't want to know, which is fine) about P-values, at least keep in mind that in a triangle test, if there's no detectable difference, the expected number of correct selections is 1/3 (not 1/2) of the total. If more than a third "got it right," turn on your critical thinking and try not to be too unduly swayed by the standard blurb.
Also that is 1, ONE, UNO, datapoint as should be taken as such (basically nothing, since science needs to be repeatable to have any significance). Hanging your hat on the results of one test (that is FAR from rigorous, and would be thrown out in any real setting), is confirmation bias at its best(worst?).

I for one would like to see a BJCP score from a national judge, for every beer in the tests. If this is not a 35 point + beer for the category all results should be thrown out, since faults could not be reliably detected (meaning the beer itself is flawed). Until then, literally nothing they say holds any merit. Seeing some of their tasters guess the beer style and picking sour beer type styles for a pilsner, is pretty telling the beer is suspect.
 
Not taking sides on this discussion but very much enjoy statistics. While I have two graduate level business stats classes under my belt, it has been 25 years so if what is written below is totally wrong - please be gentle ;).

One observation about the Brulosophy approach is that they tend to take a positive versus negative hypothesis testing approach to the exbeeriments which is how drugs are tested as they must be proven to make a difference before approval. To clarify, Brulosophy typically seeks to validate that a practice makes a difference rather than does not make a difference.

For example, if Brulosophy wanted to test whether sanitation (selected as an extreme example) makes a difference, the exbeeriment would be set up to assume that it does not make a difference, unless a statistically significant number of participants can taste a difference between a batch brewed using sanitation and one without. Alternatively, they could take a negative approach which would assume that is does make a difference unless a statistically significant number of participants can't taste a difference.

Following a negative hypothesis testing approach for many of the experiments may result in conclusions that are more palatable (pun intended) to many as the practice is always assumed to be worthwhile unless the data shows otherwise.
 
I for one would like to see a BJCP score from a national judge, for every beer in the tests. If this is not a 35 point + beer for the category all results should be thrown out, since faults could not be reliably detected (meaning the beer itself is flawed). Until then, literally nothing they say holds any merit. Seeing some of their tasters guess the beer style and picking sour beer type styles for a pilsner, is pretty telling the beer is suspect.

I like the idea but suspect it would be easy enough for skeptics to disregard the BJCP scores. Competition judges seem to get love only when they agree the beer is as good as the homebrewer thinks it is. Would you have them send the beers to actual competitions or are you thinking each contributor should get a few local judges to do the scoring outside of a competition? If sending to a competition there is issue about handling (mishandling or poor storage) plus added cost and complexity. They would pretty much have to run the full tasting panel in parallel to the competition and then wait for the judges scores to come in before publishing. If going to a local friend those that don't believe them now will just say the judge is rubber stamping the score sheets to keep the cash flowing into the Brulosophy coffers.

I'm obviously not a skeptic and do enjoy the site and content very much. Probably my second favorite home brewing resource (edited I meant after) this place. I enjoy the weekly Exbeeriments and the podcast a lot. Not a huge fan of short and shoddy, not my thing. I'm willing to trust that the brewers at Brulosophy are telling the truth when they describe their impression of the beer. This is part of every write up, comes in near the end. Part of that comes from listening to the podcasts. They are not novice brewers some are BJCP certified judges. They seem knowledgeable and competent. I'm also thinking the tests where they ask the tasters to guess the beer style doesn't come from the Exbeeriments side of the operation but from the short and shoddies side.
 
Am I the only one that listens purely for the beer review (and the random Q&A episodes) and occasionally end up listening to only the first segment before the actual experiment?
 
Any individual study ir experiment has to be taken with a pinch of salt, the taste can change a lot due to a lot of factors, maybe americans prefer one thing, and europeans the opposite, educated people will have their palate kind of educated in a certain direction, while a random person who knows anything about beer may have a totally different palate, and so on.

So for a experiment to be taken as "gospel", it needs to be replicated a lot, by and on people of all kinds and in different locations. The best thing if you see an experiment saying that there's no difference between a pilsner made with pils malt and another with pale malt, is brew them, and see if it works for you, if you don't agree with the experiment it doesn't mean it's bollocks, it just means you aren't like the panel in terms of palate
 
I like brulosophy for even attempting the experiments in the first place.
I mean someone out there had the audacity to figure out that secondary transfers are basically worthless by trying it out (an experiment)
I read all of the brulosophy stuff, but do not listen to it as established science more of a question that they post one answer to.
The "short and shoddy" stuff is interesting, but I don't brew that way. That said I also don't brew 3 vessel anymore and view it as a waste.
I think everyone that brews is smart enough to interpret brulosophy for what it is.

I do think that some things that are "accepted practice" in homebrewing are kinda an emperor's new clothes situation and enjoy seeing these things challenged.
 
That site, and this site for that matter are all for profit entities. If you think they are honest and unbiased, great. But the ads, sponsorships, and censoring don't line exactly up with that.

I'm trying to figure out where you would go to get brewing information if you eliminated all sources that took revenue from ads or sponsorship and chose to moderate their content. I'm stumped.
 
I take brulosophy tests as interesting and not as scientifically accurate, in general nothing in brewing is scientifically accurate because brewing is not an exact science.

I have always disregarded the p-stuff as totally irrelevant to me. I also have been quite surprised, on more than one occasion, in reading that the number suggested relevance or irrelevance, because considering the number of participants and the number of people indicating the different beer, "relevance" is something that I calculate so to speak by instinct and did not coincide with the p-verbiage.

Although in general I find the tests interesting, sometimes I find they could have been better designed. For instance a test on practices regarding oxidation would be more interesting if the beers were compared three months after production, and not two weeks. Or, again, a test on oxidation would be more interesting for a heavily hopped beer than a normally hopped beer.

Overall, it's interesting and gives good anecdotal information. Anecdotal information IS good. The site contains "food for thought" and can invite trying to change one's practice.
 
If i might spill the beans as to a suggestion for a Brulosophy Exbeeriment, I will give them here my top suggestion:

Brew a SMaSH lager of moderate IBU's using a LOX free Pilsner base malt and standard/good pre-LoDO practices, but with no added extra mile LoDO practices and/or process modifications, such as adding metabisulfite and BTB and ascorbic acid, etc... Then brew an identical sans for LoDO SMaSH beer applying all of the LoDO stuff and practices, while using a standard Pilsner base malt (preferably sourced from the same company from which the LOX free was sourced) with a pre-confirmed and nominally typical base malt quantity of present LOX (lipoxygenase). Leave them in kegs for 3 or 4 months, and then triangle test them before a standard triangle test participant group.
 
Last edited:
A suggestion I would give is to serve 5 or 6 samples to participants, of which 1 different, not 3. That would make the test results more meaningful with little additional effort both for the tasters and for Brulosophy.
 
I'm sure the guys have thought of all this stuff too as they are longtime brewers and regularly discuss this stuff with pro brewers as well.
 
I've read a lot of the exbeeriments on the site and have fun thinking about the variables. I'm a lazy brewer and anything that can produce good beer with less effort I'm in favor of. I remember when I first started brewing reading and hearing that 147* will provide a much more fermentable wort than 153* didn't seem to make a lot of sense, such a small difference, and was not surprised when that test came back as not significant (confirmation bias I know).

What I don't understand about this thread so far is who cares if someone takes the BL conclusions as gospel or not? Like all brewers, we get to make that decision. BL doesn't claim their results are gospel. Also, if Marshall and the crew have been able to monitize BL good for them, that does not change anything except being able to do more exbeeriments on better equipment (I'm jealous).

Bottom line, its a hobby "science" site and i'm guessing the BL guys/gals are having a good time doing and posting their exbeeriments and I enjoy seeing what they are up to next. And, I get to decide if I'm going to use the information to change my process.

Brew on :rock: :ban::mug:
 
@Silver_Is_Money I would not expect them to take that on. Seems they had difficult time with LoDO exbeeriment in the past and the experience left a mark. If they do the experiment and see no difference everyone from LoDO jumps up and down and says they didn't do it quite right.
Lodo is more a cult than an approach to brewing imo.
 
@Silver_Is_Money I would not expect them to take that on. Seems they had difficult time with LoDO exbeeriment in the past and the experience left a mark. If they do the experiment and see no difference everyone from LoDO jumps up and down and says they didn't do it quite right.

It would still have appreciable relevance.
 
Back
Top