Higgs Days at Santander

Santander is a Spanish port on the Bay of Biscay coast that next week will host its fourth annual workshop on the Higgs Boson. This meeting will be very different in character from the huge summer conferences where exciting new results on searches for the Higgs boson were recently presented to thousands of physicists. The Santander meeting involves just 30 participants with a mix of theorists and experimenters involved in the analysis of data from Fermilab and CERN. Half their time will be spent presenting slides and the other half will be discussions covering searches for standard model Higgs and other models including the charged Higgs sector of SUSY. They will talk about the procedures for combining Higgs searches across experiments and implications of any findings. The aim is to promote a dialog between theorists and experimenters about what data needs to be shared and how.

Santander beaches, photo by yeyo

There is no indication that the discussions will be webcast or recorded for public viewing and it is not sure that all the slides will appear online so as outsiders the rest of us may have very little indication of what they decide. It is unlikely that new data will be made public but there is some chance that we may finally get to see a combination of ATLAS and CMS search data. Originally we were promised a combination of the searches shown at EPS in July using the first 1/fb of data from the LHC. Instead we got a new helping of plots from the individual experiments using 1.6/fb in the most important channels and even 2.3/fb for the ZZ channel in ATLAS. These were shown at the Lepton-photon conference in August. Theorists would now very much like to see the combinations of these data sets and it is not clear why they have been held back.

One question has become very topical and has already surfaced at some of the larger Higgs workshops: Is it right to do quick approximate combinations of Higgs search data or do we need to wait for the lengthy process of producing the official combinations? This summer I have become quite notorious for doing these quick combinations and showing them on viXra log. These have variously been described by experts as “nonsense” (Bill Murray) “garbage” (John Ellis) and “wrong” (Eilam Gross), but just how bad are they? Here is a plot of my handcrafted combination of the D0 and CDF exclusion plots compared with the official combo. The thick black line is my version of the observed exclusion limit that can be compared with the dotted line of the official result, while the solid blue line is may calculated expected limit to be compared with the official dashed line. You need to click on the image for a better view.

My result is not perfect but I hope you will agree that it provides similar information and you would not be misled into drawing any wrong conclusions from it that were not in the official plot. Any discrepancy is certainly much smaller than the statistical variations indicated by the green and yellow bands for one and two sigma variations.

A more ambitious project is to combine exclusion plots for individual channels to reproduce the official results for each experiments. Here is my best attempt for the latest ATLAS results where I have combined all eight channels for primary decay products of the Higgs boson.

The result here is not as good and could only serve as a rough estimation of the proper combination. Why is that? There are several sources of error involved. Firstly the data for the individual channels had to be digitised from the plots. This was not the case for the previous Tevatron combination above where they published the plots in tabular form. ATLAS and CMS have only published such numerical data for a few channels and in some cases the quality of the plots shown is extremely poor. For example this is the best plot that ATLAS has shown for the important H → ZZ → 4l channel

As you can see it is very hard to follow the lines on this plot, especially the dashed expected limit line. I don’t want to be over-critical but seriously guys, can’t you do better than this?

Another source of error comes from neglect of correlations between the individual plots where background estimates may have the same or related systematic errors. The Higgs combination group at CERN play on this as one of the reasons why these quick combinations can’t be right, but I doubt that these effects are significant at all. If they were I would not be getting such good results for the Tevatron combination.

In fact the main source of error is in approximations used in my combination algorithm. It assumes that each statistical distribution of the underlying signals can be modeled by a flat normal distribution with a mean \mu_i and standard deviation \sigma_i. Combining normal distributions is standard stuff in particle physics the combined mean \mu and standard deviation \sigma are given by these formula

\frac{1}{\sigma^2} = \sum_i{\frac{1}{\sigma_i^2}}

\frac{\mu}{\sigma^2} = \sum_i{\frac{\mu_i}{\sigma_i^2}}

For example, if one experiment tells me that the mass of the proton is 938.41 ± 0.21 GeV and another tells me it is 938.22 ± 0.09 GeV and I know that the errors and independent, then I can combine with the above formula to get a value of 938.25 ± 0.08 GeV. The Particle Data Group does this kind of thing all the time.

A plot of the signal for the Higgs boson given by the ATLAS results would look like this,

The black line (value of \mu) is the observed combined signal for the Higgs boson normalised to a scale where no Higgs boson is zero and a standard model Higgs boson gives one. The blue and cyan bands show the one and two sigma statistical uncertainty (\mu \pm\sigma and \mu \pm 2\sigma). Don’t think about where the Higgs boson is for now. Just look at the upper two sigma level curve and compare it with the ATLAS Higgs exclusion plot above (i.e the dotted line, click to enlarge for a better view). These are of course the same lines because the 95% level exclusion is given when the 2 sigma error is below the signal for SM Higgs. The expected line on the exclusion plot is just where the observed line would be if the signal were evrywhere zero, i.e it is a plot of 2\sigma. In summary, the observed limit for CL_s in the exclusion plot is just \mu + 2\sigma and the expected limit is just 2\sigma. We can derive one plot from the other using this simple transformation.

From this it should be clear how to combine the exclusion plots. We first transform them all to signal plots, then they can be combined as if they are normal distributions. Finally the combined signal plot can be transformed back to give the combined exclusion plot. This is what I did for the viXra combinations above.

Ignoring the digitisation errors and the unknown correlations, the largest source of error is the assumption that the distribution is normal. In reality a log normal distribution or a Poisson distribution would be better, but these require more information. Fortunately the central limit theorem tells us that anything will approximate a normal distribution when high enough statistics are available so the combination method gets better as more events accumulate. That is why the viXra combination of the exclusion plots for each experiment is more successful than for the combination of individual channels. The number of events seen in some of these channels is very low and the flat normal distribution is not a great approximation to use. As more data is collected the result will get better. Of course we cannot expect a reliable signal to emerge from individual channels until the statistics are good, so it could be argued that the approximation is covered by the statistical fluctuations anyway.

I don’t know if a full LHC combination will emerge next week at the Santander workshop but in case it does, here is my best prediction from the most recent data for comparison with anything they might show.

Some people say that there is no point producing these plots because the official versions will be ready soon enough, but they are missing the point. The LHC will produce vasts amounts of data over its lifespan and these Higgs plots are just the beginning. The experimenters are pretty good at doing the statistics and comparing with some basic models provided by the theorists, but this is just a tiny part of what theorists want to do. The LHC demands a much more sophisticated relationship between experimenter and theorists than any previous experiment and it will be necessary to provide data in numerical forms that the theorists can use to investigate a much wider range of possible models.

As a crude example of what I mean, just look at the plot above. It provides conflicting evidence for a Higgs boson signal. At 140 GeV there is an interesting excess but it is below the exclusion limit line. Is this a hint of a Higgs signal or not? To answer this I might look at different channels combined over the experiments. Here is the ZZ channel combined over ATLAS and CMS.

The Higgs hint at 140 GeV is now nice and clear, though not significant enough yet for a reliable conclusion. Here is the diphoton channel combination.

Again the 140 GeV signal is looking good. What about the WW channel?

Here is where the problem lies. The WW channel has a broad excess from 120 GeV to 170 GeV at 2 sigma significance, but it is excluded from about 150 GeV . In fact the energy resolution in the WW channel is not very good because it relies on missing energy calculations to reconstruct the neutrino component of the mass estimation. Perhaps it would be better to combine just the diphoton and ZZ channels that have better resolution. I can show the result in the form of a signal plot.

It’s still inconclusive, but at least it is not contradictory.

This is just as example of why it will be useful for theorists to be able to explore the data themselves. The signal for the Higgs will eventually be studied in detail by the experiments, but what about other models? There is a limit to how many plots the experiments can show. To really explore the data that the LHC will produce theorists will need to be able to plug data into their own programs and compare it with their own models. The precise combinations produced by the Higgs combination groups take hundreds of thousands of CPU hours to build and are fraught with convergence issues. My combinations are done in milliseconds and gives a result that is just as useful.

There is no reason why the experiments can’t provide cross-section data in numerical form for a wide range of channels with better approximations than flat normal distributions if necessary. This would allow accurate combinations to be generated for an infinite range of models with varying particle spectra and branching ratios. It will be essential that any physicist has the possibility to do this. I hope that this is what the theorists will be telling the experiments at Santander next week and that the experiments will be listening.

Update 26 Sept 2011: I found a better version of the ATLAS ZZ -> 4l plot that I was moaning about. It has not appeared in the conference notes for some reason but it is same data from LP11 so I think it must be OK to show.

The latest expectation from the combination group is that a Lepton-Photon based combo will be ready for  Hadron Collider Physics 2011 which is in Paris starting 14th November.

Update 1-Oct-2011: Most of the slides from the Santander meeting have now been uploaded

26 Responses to Higgs Days at Santander

  1. Luboš Motl says:

    In your total CMS+ATLAS, 119 GeV clearly rules while non-SUSY values above 134 GeV suck.

    • Philip Gibbs says:

      True, but when I remove the lo-res WW channel the signal at 140 GeV is better. I would prefer to see it at 119 GeV because it implies more interesting new physics. Two bosons would be even better. We need more data soon to settle this.

    • Luboš Motl says:

      Dear Phil, low-res or high-res, I just believe your overall graph that excludes the Standard Model Higgs at 140 GeV. There can be another particle of the same mass but not the simple SM Higgs.

      I think that the low-res channels can’t lead to a “spurious exclusion” – they may only lead to a “spurious positive excess” at wrong masses.

    • Philip Gibbs says:

      If they model the uncertainty in mass resolution correctly then I agree that they should not exclude at the real Higgs mass. If it goes down any further we have to take 140 GeV as ruled out for SM Higgs

    • Luboš Motl says:

      Dear Phil, I am not sure whether we’re talking about the same exclusions. I am talking about your exclusion

      which eliminates everything above 134 GeV or so. My understanding is that it is just a combination of properly done ATLAS exclusions; and CMS exclusions, and you get below 140 GeV in your deletion because both detectors were pretty close to exclusion over there, anyway.

      So in this sense, I don’t think that a mistake in your calculation could invalidate your result which is that 140 GeV is excluded at 95% c.l. in the combined ATLAS plus CMS data.

      • Philip Gibbs says:

        Dear Lusbo, I agree but, at 140 GeV the plot excludes at 95% both the hypothesis that there as a SM Higgs and the null hypothesis that there is no SM Higgs. Possible explanations are
        (1) There is a Higgs at 140GeV but a 5% fluke ruled it out
        (2) There is no Higgs at 140 GeV but a 5% fluke ruled this out.
        (3) There is something else BSM that accounts for the result
        (4) There are systematic inaccuracies in the analysis at some point(s), e.g. background calculation, measurement of missing energy in WW channel, combination error etc.
        At this stage I am giving equal weight to each of these possibilities even if theoretical bias makes me prefer (2) or (3).

      • Luboš Motl says:

        Dear Plhi (note that I carefully reproduced your permutation of the letters), I appreciate it. There is 95% evidence for beyond-the standard-model physics in those charts – or something unlikely (or wrong) has occurred.

    • Dilaton says:

      I feel sorry for the poor SM higgs; Lumo is so determined to get rid of it … LOL 🙂

      • Philip Gibbs says:

        I’ve already declared it dead a couple of times myself but it keeps coming back with new signs of life

      • Luboš Motl says:

        Nope, Dilaton! I am not that cruel. 😉

        A 119 GeV Standard Model Higgs is perfectly fine as of today, OK with me, too. Of course, the low mass would be an indirect hint of new physics but this new physics clearly can be out of the LHC reach.

  2. chris says:

    what you suggest is the horror scenario for any collaboration and this is ultimately why they don’t do it. publicly offering raw data on a hot topic means that a lot of people will jump in and do their analysis. while yours is (in my view) a valiant effort, others will recognize the face of jesus in the data or something and the collaborations have to use half their manpower just to debuke misguided analyses.

    • Philip Gibbs says:

      To be clear, I am not suggesting that raw data be published. The raw data requires knowledge of the detectors and triggers to process and this can only be done by the collaborations. However the data can be reduced to numbers suitable for public consumption such as those provided on the exclusion plots for individual decay modes. Just a little more detail on the data would be sufficient. Given these numbers any theorist can take their personal Higgs sector model and use their computed branching ratios to derive the combined exclusion plot for that model. From what I have shown here I think this process can be made sufficiently accurate to be reliable.

      If you are worried about people deriving crazy results from the data then I think you are crazy. There is a scientific process that can be used to verify any claims. There will always be more silly claims without accurate data than with it.

      In any case there is really no choice. With the amount of raw information that can be produced from the LHC and the number of models that need to be tested there is really no other way forward. The sooner the collaborations accept it the better. Already they have found that answers about models such as supersymmetry will not appear easily. They need to think in terms of setting triggers to produce more generic numbers for a wide range of decay modes rather than tailoring them for a limited number of unrealistically constrained versions of the models. Anything less is failure to get the best results from the experiments.

      • chris says:

        Just to be clear, what i mean is not real crackpottery but rather this kind of grandiose claims:

        statistical analyses can be really really tricky and many mistakes might creep in. it is an unfortunate fact that those who are quick and sensational get the media coverage and the others are left to clean up after the party is over.

        don’t get me wrong, i’m all for open access and actually i would be really surprised if atlas or CMS would not give you the numbers which they plotted if you point-blank asked them to. but i can more than understand if they decide that it’d be good that they have a first go at the decisive Higgs exclusion (or discovery) plot. in fact this is what i am hoping for: a month long complete silence now and then the final result.

    • JollyJoker says:

      Physicists already have the opportunity to use most of their time on rebutting crackpottery. They rarely care.

      I think the idea of having first dibs on the data is what keeps us from getting the numbers Phil would want, but this is only a matter of time. The unofficial combinations may even keep them from releasing not-fully-combined results. Wouldn’t want a “nonsense”, “garbage”, “wrong” blog post to be the first with a fully correct five sigma Higgs signal.

    • Dilaton says:

      The data should not be offered puplicly but given much more efficiently (maybe in a against “outsiders” protected way ?) to (registered ?) theorists or other physicist who need and depend on them …

  3. Philip Gibbs says:

    If the data was allowed only to qualified physicists it is unlikely that Penrose would be excluded, so if you don’t like ideas such as his you wont stop them that way.

    I really dont think that the data is dangerous. The benefit of making it public and letting theorists examine it sooner outweighs any concerns anyone may have about the media picking the wrong theories to write about.

    If CERN want to hold back the data until they get a five sigma signal then it is a pity. They miss a chance to let the public see this story develop naturally. The credit for any discovery is going to CERN and all the people who worked on it, not the first blogger to say where he thinks it is. If the Higgs sector is more complicated than the SM then some theorist may see how it really works quicker. Why should they not be allowed to do that?

  4. Fabiano says:

    Hey Philip, why you don’t use PNG for your interesting plots? There would be no annoying JPEG artifacts.

  5. wl59 says:

    I think that earlier or later have to become accessible all raw data of any experiments.

    On the other side, we know that a problem is that everything is over-interpreted. If CERN would publish NOW everyday all raw data, or even half-ready interpretations, then thousands of pseudoscientists would daily deliver their interpretations. But for us already too much are wrong or too hurry interpretations even by experimentators, as occured a few months ago …

    Thus it’s right that CERN at the moment publish nothing what is not an essential progress in clearness. Let’s wait still two or three weeks, then the situation may be much clearer, inclusive enough data for put all the interval till 600 GeV below the 95% exclusion probability.

    It’s important that CERN currently concentrate themself to get data, before something breaks on the LHC . Data what you have, you have. That’s like in astronomy, quickly obtain more plats, of everything you need, before come the next clouds, rain, or something brake on your instrument … Important is to have the data. Analyse them, inspect and measure your plates you can later, during rain saison, or on CERN, during the LHC reparation periods.

  6. […] aunque quizás no para todo el mundo). También merece la pena leer a Philip Gibbs, “Higgs Days at Santander,” viXra log, Sep. 18th, 2011. Yo no he comentado nada al respecto en este blog hasta ahora […]

  7. kevin says:

    With 4 fb-1 CMS, now has enough for a 95% SM higgs exclusion study all the way down to 113.5 Gev. https://twiki.cern.ch/twiki/pub/CMSPublic/Hig11010TWiki/LimitSMRel.png

    • Philip Gibbs says:

      That sounds a bit overoptimistic

      • Not here, on vixra, where Phil will combine ATLAS AND CMS 10/fb-1 ang get the nobel prize for excluding the Higgs..

      • Philip Gibbs says:

        Not sure about the Nobel, LOL, but if the experimenters want to be the ones to do the analysis they need to stop calling my work nonsense and produce some timely combinations of their own, otherwise people may just start to think of them as being a little bit too arrogant.

  8. kevin says:

    When you look at both experiments ZZ->4l data, you see that they have analysed 3.7 fb-1 of data. They have found 8 events in the 120-145 gev mass region instead of 4. With 10 fb-1, if that is not a statistical fluctuation, they will have a clear spike at the relevant mass, with 24 events instead of 12. A relevant graph would be a combined H->gamma gamma and ZZ->4l from both experiments. Other channels are to fuzzy with so few data. With so few events, it is easy to do the combination.

%d bloggers like this: