[SciPy-User] Asymmetric peak fitting

Discussion:

Paweł Kwaśniewski

2015-03-18 15:06:19 UTC

Hi All,

I'm currently trying to fit some experimental data in the form of
asymmetric peaks. In principle it's difficult to find a distribution
describing this kind of data. My goal is to get the full width half maximum
of the peak and the peak position. I tried to look for some ready solutions
within scipy but could not find any. I did find a nice paper though:

http://arxiv.org/abs/0711.4449

Since it's not so trivial (at least not for me) to implement this, I'd like
to ask if anyone in the community has already done this (or something
similar) and would like to share the code.

Cheers,

Pawel Kwasniewski

j***@chemistry.montana.edu

2015-03-18 21:36:35 UTC

Permalink

Post by PaweÅ KwaÅniewski
Hi All,
I'm currently trying to fit some experimental data in the form of
asymmetric peaks. In principle it's difficult to find a distribution
describing this kind of data. My goal is to get the full width half maximum
of the peak and the peak position. I tried to look for some ready solutions
http://arxiv.org/abs/0711.4449
Since it's not so trivial (at least not for me) to implement this, I'd
like to ask if anyone in the community has already done this (or something
similar) and would like to share the code.

There are a few publications detailing modified gaussians for peak tailing
in chromatography:
http://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution.
Be careful about the equations on the Wikipedia page, they are numerically
unstable: reference [http://dx.doi.org/10.1002%2Fcem.1343] for details. I
have written python versions of those equations which are accurate but not
fast.

Asymmetric peaks are also common in many forms of spectroscopy, especially
mass spectrometry. I have also written Scipy code to fit peaks for that
application.

Jonathan

Matt Newville

2015-03-19 01:16:26 UTC

Permalink

Pawel,

You might find the lmfit module (http://lmfit.github.io/lmfit-py/) useful.
This is a module for least-squares minimization and curve-fitting, built on
top of scipy.optimize. It includes several built-in Models, a few
representing asymmetric peaks such as an exponentially damped Gaussian (
http://lmfit.github.io/lmfit-py/builtin_models.html#exponentialgaussianmodel).
A quick look at the paper you referenced suggests this function might be
sufficient, but if this or one of the other built-in Models isn't exactly
what you're looking forward, it's very easy to make a new Model class from
a Python function that calculates and returns the model function. Lmfit
Models can also be added or multiplied together to make composite models
(say, Gaussian + Step + Quadratic), and allow you to place bounds and/or
constraints on any of the Parameters in the fit.

Though there are many features you can use to set up models and their
parameters, using these models (either built-in or one that you define
yourself) for curve-fitting data is pretty straightforward. See
http://lmfit.github.io/lmfit-py/builtin_models.html#example-1-fit-peaked-data-to-gaussian-lorentzian-and-voigt-profiles
for a simple example. That example uses symmetric peaks, but the use of
any Model for curve-fitting is basically the same.

Hope that helps,

--Matt Newville

Paweł Kwaśniewski

2015-03-19 08:09:12 UTC

Permalink

Jonathan, Matt,

Thank you for your answers. I must say that lmfit is already a good friend
of mine - I use it for quite some time for fitting. Somehow I didn't think
of looking there for asymmetric peak models. Thanks for pointing this out!
I suppose this will solve my problem.

Cheers,

Pawel

Post by Matt Newville
Pawel,

You might find the lmfit module (http://lmfit.github.io/lmfit-py/)
useful. This is a module for least-squares minimization and curve-fitting,
built on top of scipy.optimize. It includes several built-in Models, a few
representing asymmetric peaks such as an exponentially damped Gaussian (
http://lmfit.github.io/lmfit-py/builtin_models.html#exponentialgaussianmodel).
A quick look at the paper you referenced suggests this function might be
sufficient, but if this or one of the other built-in Models isn't exactly
what you're looking forward, it's very easy to make a new Model class from
a Python function that calculates and returns the model function. Lmfit
Models can also be added or multiplied together to make composite models
(say, Gaussian + Step + Quadratic), and allow you to place bounds and/or
constraints on any of the Parameters in the fit.
Though there are many features you can use to set up models and their
parameters, using these models (either built-in or one that you define
yourself) for curve-fitting data is pretty straightforward. See
http://lmfit.github.io/lmfit-py/builtin_models.html#example-1-fit-peaked-data-to-gaussian-lorentzian-and-voigt-profiles
for a simple example. That example uses symmetric peaks, but the use of
any Model for curve-fitting is basically the same.
Hope that helps,
--Matt Newville
_______________________________________________
SciPy-User mailing list
http://mail.scipy.org/mailman/listinfo/scipy-user

Paweł Kwaśniewski

2015-03-19 09:57:01 UTC

Permalink

Matt,

I have one more question: the weights for fitting a model should be
1./error_bar or the error_bar? The name "weights" suggests the former, but
I'd like to be sure.

Cheers,

PaweÅ

Post by PaweÅ KwaÅniewski
Jonathan, Matt,
Thank you for your answers. I must say that lmfit is already a good friend
of mine - I use it for quite some time for fitting. Somehow I didn't think
of looking there for asymmetric peak models. Thanks for pointing this out!
I suppose this will solve my problem.
Cheers,
Pawel

Post by Matt Newville
Pawel,

You might find the lmfit module (http://lmfit.github.io/lmfit-py/)
useful. This is a module for least-squares minimization and curve-fitting,
built on top of scipy.optimize. It includes several built-in Models, a few
representing asymmetric peaks such as an exponentially damped Gaussian (
http://lmfit.github.io/lmfit-py/builtin_models.html#exponentialgaussianmodel).
A quick look at the paper you referenced suggests this function might be
sufficient, but if this or one of the other built-in Models isn't exactly
what you're looking forward, it's very easy to make a new Model class from
a Python function that calculates and returns the model function. Lmfit
Models can also be added or multiplied together to make composite models
(say, Gaussian + Step + Quadratic), and allow you to place bounds and/or
constraints on any of the Parameters in the fit.
Though there are many features you can use to set up models and their
parameters, using these models (either built-in or one that you define
yourself) for curve-fitting data is pretty straightforward. See
http://lmfit.github.io/lmfit-py/builtin_models.html#example-1-fit-peaked-data-to-gaussian-lorentzian-and-voigt-profiles
for a simple example. That example uses symmetric peaks, but the use of
any Model for curve-fitting is basically the same.
Hope that helps,
--Matt Newville
_______________________________________________
SciPy-User mailing list
http://mail.scipy.org/mailman/listinfo/scipy-user

Paweł Kwaśniewski

2015-03-19 11:04:17 UTC

Permalink

Hi,

Ok, now I have a problem. In the attached script I try to fit some example
experimental data I have. The peak is actually a numerical derivative of a
step function - this is why the values are so large. I tried different
models, all except the ExponentialGaussianModel can handle the fit pretty
well using the automatically guessed parameters. I don't really understand
why. Is there something I'm missing here?

Cheers,

Pawel

Post by PaweÅ KwaÅniewski
Matt,
I have one more question: the weights for fitting a model should be
1./error_bar or the error_bar? The name "weights" suggests the former, but
I'd like to be sure.
Cheers,
PaweÅ

Post by PaweÅ KwaÅniewski
Jonathan, Matt,
Thank you for your answers. I must say that lmfit is already a good
friend of mine - I use it for quite some time for fitting. Somehow I didn't
think of looking there for asymmetric peak models. Thanks for pointing this
out! I suppose this will solve my problem.
Cheers,
Pawel

Post by Matt Newville
Pawel,

You might find the lmfit module (http://lmfit.github.io/lmfit-py/)
useful. This is a module for least-squares minimization and curve-fitting,
built on top of scipy.optimize. It includes several built-in Models, a few
representing asymmetric peaks such as an exponentially damped Gaussian (
http://lmfit.github.io/lmfit-py/builtin_models.html#exponentialgaussianmodel).
A quick look at the paper you referenced suggests this function might be
sufficient, but if this or one of the other built-in Models isn't exactly
what you're looking forward, it's very easy to make a new Model class from
a Python function that calculates and returns the model function. Lmfit
Models can also be added or multiplied together to make composite models
(say, Gaussian + Step + Quadratic), and allow you to place bounds and/or
constraints on any of the Parameters in the fit.
Though there are many features you can use to set up models and their
parameters, using these models (either built-in or one that you define
yourself) for curve-fitting data is pretty straightforward. See
http://lmfit.github.io/lmfit-py/builtin_models.html#example-1-fit-peaked-data-to-gaussian-lorentzian-and-voigt-profiles
for a simple example. That example uses symmetric peaks, but the use of
any Model for curve-fitting is basically the same.
Hope that helps,
--Matt Newville
_______________________________________________
SciPy-User mailing list
http://mail.scipy.org/mailman/listinfo/scipy-user

j***@chemistry.montana.edu

2015-03-19 16:02:32 UTC

Permalink

Post by PaweÅ KwaÅniewski
Ok, now I have a problem. In the attached script I try to fit some example
experimental data I have. The peak is actually a numerical derivative of a
step function - this is why the values are so large. I tried different
models, all except the ExponentialGaussianModel can handle the fit pretty
well using the automatically guessed parameters. I don't really understand
why. Is there something I'm missing here?

It might not be your script. I can't reproduce the figure on Wikipedia (
Loading Image...

) using the lmfit
model. Specifically, cases with sigma != 1.0 don't seem right.

Jonathan

Matt Newville

2015-03-19 19:01:57 UTC

Permalink

Hi Jonathon,

Post by PaweÅ KwaÅniewski
Ok, now I have a problem. In the attached script I try to fit some example

Post by PaweÅ KwaÅniewski
experimental data I have. The peak is actually a numerical derivative of a
step function - this is why the values are so large. I tried different
models, all except the ExponentialGaussianModel can handle the fit pretty
well using the automatically guessed parameters. I don't really understand
why. Is there something I'm missing here?

It might not be your script. I can't reproduce the figure on Wikipedia (
http://en.wikipedia.org/wiki/File:EMG_Distribution_PDF.png) using the
lmfit model. Specifically, cases with sigma != 1.0 don't seem right.
Jonathan

Hmm, not sure what you're seeing. From the link you give, this script

#!/usr/bin/env python
from numpy import linspace
import matplotlib.pyplot as plt
from lmfit.lineshapes import expgaussian

x = linspace(-4, 6, 101)
red = expgaussian(x, amplitude=1, center= 0.0, sigma=1.0, gamma=1.0)
green = expgaussian(x, amplitude=1, center=-2.0, sigma=1.0, gamma=1.0)
blue = expgaussian(x, amplitude=1, center= 0.0, sigma=3.0, gamma=1.0)
yellow = expgaussian(x, amplitude=1, center=-3.0, sigma=1.0, gamma=0.25)

plt.plot(x, red, 'r-')
plt.plot(x, green, 'g-')
plt.plot(x, blue, 'b-')
plt.plot(x, yellow, 'y-')
plt.savefig('test_expgauss.png')
plt.show()
#########################################

gives a plot that looks pretty similar to the one at the wikipedia page
(except for the nice labels):

[image: Inline image 2]

Do you see something different? If so, what do you see?
Cheers,

--Matt Newville

j***@chemistry.montana.edu

2015-03-19 19:26:49 UTC

Permalink

Post by Matt Newville
Do you see something different? If so, what do you see?

Mine is definitely different.

Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64
bit (AMD64)] on win32
scipy 0.14.0
numpy 1.8.1
lmfit 0.8.3

Using your test script:
[image: Inline image 1]

Jonathan

Matt Newville

2015-03-20 00:41:16 UTC

Permalink

Jonathon, Pawel,

Post by j***@chemistry.montana.edu

Post by Matt Newville
Do you see something different? If so, what do you see?

Mine is definitely different.

Ah, sorry the very poor memory. There was a fix to the expgaussian()
function a month ago, after the 0.8.3 release (and thanks to Tim Spillane
for the fix!). That explains the problem you're seeing, and might
explain why Pawel was having trouble too. The fix is fairly simple
(change `arg2` to have `/(s2*sigma)` instead of `/s`, see
https://github.com/lmfit/lmfit-py/blob/master/lmfit/lineshapes.py#L116),
and should be orthogonal to any other changes in the current branch.

--Matt

j***@chemistry.montana.edu

2015-03-20 01:30:53 UTC

Permalink

Post by Matt Newville
Ah, sorry the very poor memory. There was a fix to the expgaussian()
function a month ago, after the 0.8.3 release (and thanks to Tim Spillane
for the fix!). That explains the problem you're seeing, and might
explain why Pawel was having trouble too. The fix is fairly simple
(change `arg2` to have `/(s2*sigma)` instead of `/s`, see
https://github.com/lmfit/lmfit-py/blob/master/lmfit/lineshapes.py#L116),
and should be orthogonal to any other changes in the current branch.

Thanks for checking into this. Do you have any thoughts on the analysis by
Kalambet and Tikhonov in the paper I linked previously? Specifically, in
the code:

gss = gamma*sigma*sigma
arg1 = gamma*(center +gss/2.0 - x)
arg2 = (center + gss - x)/(s2*sigma)
return amplitude*(gamma/2) * exp(arg1) * erfc(arg2)

The exp(arg1) factor is much too large and erfc(arg2) is much too small,
when x<1/gamma ?

Jonathan

Matt Newville

2015-03-20 03:15:38 UTC

Permalink

Hi Jonathon,

Post by j***@chemistry.montana.edu

Thanks for checking into this. Do you have any thoughts on the analysis
by Kalambet and Tikhonov in the paper I linked previously? Specifically,
gss = gamma*sigma*sigma
arg1 = gamma*(center +gss/2.0 - x)
arg2 = (center + gss - x)/(s2*sigma)
return amplitude*(gamma/2) * exp(arg1) * erfc(arg2)
The exp(arg1) factor is much too large and erfc(arg2) is much too small,
when x<1/gamma ?
Jonathan

I haven't looked at the paper in great detail, and wasn't aware of (and/or
never experienced myself) the potential instability -- but I haven't used
this function much myself. If I understand correctly, the recommendation
is that a version of the exponentially modified Gaussian that used Eq. 6
from the Kalambet, et al article (and scipy.special.erfcx()) would be
better behaved. Is that correct, and the whole story?

A Pull Request or a patch for candidate replacement function based on this
would be greatly appreciated.

--Matt

Paweł Kwaśniewski

2015-03-20 08:43:52 UTC

Permalink

Hi Matt, Jonathan,

Post by Matt Newville
Hi Jonathon,

Post by j***@chemistry.montana.edu

That was the problem! Thanks! I'm sorry, I should also attach the fit
result figure to my post. But now it works like a charm.

Thanks!

Pawel

Post by Matt Newville

Post by j***@chemistry.montana.edu
Thanks for checking into this. Do you have any thoughts on the analysis
by Kalambet and Tikhonov in the paper I linked previously? Specifically,
gss = gamma*sigma*sigma
arg1 = gamma*(center +gss/2.0 - x)
arg2 = (center + gss - x)/(s2*sigma)
return amplitude*(gamma/2) * exp(arg1) * erfc(arg2)
The exp(arg1) factor is much too large and erfc(arg2) is much too small,
when x<1/gamma ?
Jonathan

I haven't looked at the paper in great detail, and wasn't aware of (and/or
never experienced myself) the potential instability -- but I haven't used
this function much myself. If I understand correctly, the recommendation
is that a version of the exponentially modified Gaussian that used Eq. 6
from the Kalambet, et al article (and scipy.special.erfcx()) would be
better behaved. Is that correct, and the whole story?
A Pull Request or a patch for candidate replacement function based on
this would be greatly appreciated.
--Matt
_______________________________________________
SciPy-User mailing list
http://mail.scipy.org/mailman/listinfo/scipy-user

Evgeni Burovski

2015-03-20 10:03:16 UTC

Permalink

FWIW, this distribution has been implemented in scipy a couple of
weeks ago as `exponnorm` --- it's only available in the dev version at
the moment. So `exponnorm.fit` may work --- if it doesn't, a patch
would be welcome :-)

Post by PaweÅ KwaÅniewski
Hi Matt, Jonathan,

Post by Matt Newville
Hi Jonathon,

Post by j***@chemistry.montana.edu

Post by Matt Newville
Ah, sorry the very poor memory. There was a fix to the expgaussian()
function a month ago, after the 0.8.3 release (and thanks to Tim Spillane
for the fix!). That explains the problem you're seeing, and might explain
why Pawel was having trouble too. The fix is fairly simple (change `arg2`
to have `/(s2*sigma)` instead of `/s`, see
https://github.com/lmfit/lmfit-py/blob/master/lmfit/lineshapes.py#L116), and
should be orthogonal to any other changes in the current branch.

That was the problem! Thanks! I'm sorry, I should also attach the fit result
figure to my post. But now it works like a charm.
Thanks!
Pawel

Post by Matt Newville

Post by j***@chemistry.montana.edu
Thanks for checking into this. Do you have any thoughts on the analysis
by Kalambet and Tikhonov in the paper I linked previously? Specifically, in
gss = gamma*sigma*sigma
arg1 = gamma*(center +gss/2.0 - x)
arg2 = (center + gss - x)/(s2*sigma)
return amplitude*(gamma/2) * exp(arg1) * erfc(arg2)
The exp(arg1) factor is much too large and erfc(arg2) is much too small,
when x<1/gamma ?
Jonathan

I haven't looked at the paper in great detail, and wasn't aware of (and/or
never experienced myself) the potential instability -- but I haven't used
this function much myself. If I understand correctly, the recommendation
is that a version of the exponentially modified Gaussian that used Eq. 6
from the Kalambet, et al article (and scipy.special.erfcx()) would be better
behaved. Is that correct, and the whole story?
A Pull Request or a patch for candidate replacement function based on
this would be greatly appreciated.
--Matt
_______________________________________________
SciPy-User mailing list
http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
http://mail.scipy.org/mailman/listinfo/scipy-user

j***@chemistry.montana.edu

2015-03-20 15:49:10 UTC

Permalink

Post by Matt Newville
I haven't looked at the paper in great detail, and wasn't aware of (and/or
never experienced myself) the potential instability -- but I haven't used
this function much myself. If I understand correctly, the recommendation
is that a version of the exponentially modified Gaussian that used Eq. 6
from the Kalambet, et al article (and scipy.special.erfcx()) would be
better behaved. Is that correct, and the whole story?
A Pull Request or a patch for candidate replacement function based on
this would be greatly appreciated.

The attached test script should show the issue. The form of the equation
using erfcx is likewise problematic, but at the other limit (small gamma).
They recommend switching between the two equations at x = 1/gamma.
[image: Inline image 1]

I did this:
y2 = emg_kalambet2(x, u, s, a, gamma)
y6 = emg_kalambet6(x, u, s, a, gamma)

y = scipy.zeros(shape=x.shape)
y[scipy.where(x>1./gamma)] = y2
y[scipy.where(x<=1./gamma)] = y6

It's very likely that's not the optimal approach. Maybe this whole issue
doesn't matter; the details of floating point numbers are really outside my
area of expertise.

Jonathan

Matt Newville

2015-03-19 18:55:39 UTC

Permalink

Hi Pawel,

Post by PaweÅ KwaÅniewski
Hi,
Ok, now I have a problem. In the attached script I try to fit some example
experimental data I have. The peak is actually a numerical derivative of a
step function - this is why the values are so large. I tried different
models, all except the ExponentialGaussianModel can handle the fit pretty
well using the automatically guessed parameters. I don't really understand
why. Is there something I'm missing here?

Sorry for the trouble, but I'm not sure what is happening. It works OK for
me with your data. With the attached script (trivial modifications of your
script), I get printed results of

[[Model]]
Model(expgaussian)
[[Fit Statistics]]
# function evals = 124
# data points = 301
# variables = 4
chi-square = 139414753927.191
reduced chi-square = 469409945.883
[[Variables]]
amplitude: 51178.7742 +/- 1.13e+03 (2.21%) (init= 48629)
sigma: 0.02217557 +/- 0.001300 (5.86%) (init= 0.035)
center: -0.03345298 +/- 0.001281 (3.83%) (init=-0.005000045)
gamma: 21.0962075 +/- 1.320993 (6.26%) (init= 1)
[[Correlations]] (unreported correlations are < 0.250)
C(center, gamma) = 0.771
C(sigma, gamma) = 0.616
C(amplitude, gamma) = -0.530
C(sigma, center) = 0.475
C(amplitude, center) = -0.408

and the attached figure of data and best-fit.
[image: Inline image 1]
If you're getting very different results, please be more specific.
Cheers,

--Matt Newville