Reputation: 822
For a few days I've been working on this problem and I'm stuck ...
I have performed a number of Monte Carlo simulations in R which gives an output y for each input x and there is clearly some simple relationship between x and y, so I want to identify the formula and its parameters. But I can't seem to get a good overall fit for both the 'Low x' and 'High x' series, e.g. using a logarithm like this:
dat = data.frame(x=x, y=y)
fit = nls(y~a*log10(x)+b, data=dat, start=list(a=-0.8,b=-2), trace=TRUE)
I have also tried to fit (log10(x), 10^y) instead, which gives a good fit but the reverse transformation doesn't fit (x, y) very well.
Can anyone solve this?
Please explain how you found the solution.
Thanks!
EDIT:
Thanks for all the quick feedback!
I am not aware of a theoretical model for what I'm simulating so I have no basis for comparison. I simply don't know the true relationship between x and y. I'm not a statistician, by the way.
The underlying model is sort of a stochastic feedback-growth model. My objective is to determine the long-term growth-rate g given some input x>0, so the output of a system grows exponentially by the rate 1+g in each iteration. The system has a stochastic production in each iteration based on the system's size, a fraction of this production is output and the rest is kept in the system determined by another stochastic variable. From MC simulation I have found the growth-rates of the system output to be log-normal distributed for every x I have tested and the y's in the data-series are the logmeans of the growth-rates g. As x goes towards infinity g goes towards zero. As x goes towards zero g goes towards infinity.
I would like a function that could calculate y from x. I actually only need a function for low x, say, in the range 0 to 10. I was able to fit that quite well by y=1.556 * x^-0.4 -3.58, but it didn't fit well for large x. I'd like a function that is general for all x>0. I have also tried Spacedman's poly fit (thanks!) but it doesn't fit well enough in the crucial range x=1 to 6.
Any ideas?
EDIT 2:
I have experimented some more, also with the detailed suggestions by Grothendieck (thanks!) After some consideration I decided that since I don't have a theoretical basis for choosing one function over another, and I'm most likely only interested in x-values between 1 and 6, I ought to use a simple function that fits well. So I just used y~a*x^b+c and made a note that it doesn't fit for high x. I may seek the community's help again when the first draft of the paper is finished. Perhaps one of you can spot the theoretical relationship between x and y once you see the Monte Carlo model.
Thanks again!
Low x series:
x y
1 0.2 -0.7031864
2 0.3 -1.0533648
3 0.4 -1.3019655
4 0.5 -1.4919278
5 0.6 -1.6369545
6 0.7 -1.7477481
7 0.8 -1.8497117
8 0.9 -1.9300209
9 1.0 -2.0036842
10 1.1 -2.0659970
11 1.2 -2.1224324
12 1.3 -2.1693986
13 1.4 -2.2162889
14 1.5 -2.2548485
15 1.6 -2.2953162
16 1.7 -2.3249750
17 1.8 -2.3570141
18 1.9 -2.3872684
19 2.0 -2.4133978
20 2.1 -2.4359624
21 2.2 -2.4597122
22 2.3 -2.4818787
23 2.4 -2.5019371
24 2.5 -2.5173966
25 2.6 -2.5378936
26 2.7 -2.5549524
27 2.8 -2.5677939
28 2.9 -2.5865958
29 3.0 -2.5952558
30 3.1 -2.6120607
31 3.2 -2.6216831
32 3.3 -2.6370452
33 3.4 -2.6474608
34 3.5 -2.6576862
35 3.6 -2.6655606
36 3.7 -2.6763866
37 3.8 -2.6881303
38 3.9 -2.6932310
39 4.0 -2.7073198
40 4.1 -2.7165035
41 4.2 -2.7204063
42 4.3 -2.7278532
43 4.4 -2.7321731
44 4.5 -2.7444773
45 4.6 -2.7490365
46 4.7 -2.7554178
47 4.8 -2.7611471
48 4.9 -2.7719188
49 5.0 -2.7739299
50 5.1 -2.7807113
51 5.2 -2.7870781
52 5.3 -2.7950429
53 5.4 -2.7975677
54 5.5 -2.7990999
55 5.6 -2.8095955
56 5.7 -2.8142453
57 5.8 -2.8162046
58 5.9 -2.8240594
59 6.0 -2.8272394
60 6.1 -2.8338866
61 6.2 -2.8382038
62 6.3 -2.8401935
63 6.4 -2.8444915
64 6.5 -2.8448382
65 6.6 -2.8512086
66 6.7 -2.8550240
67 6.8 -2.8592950
68 6.9 -2.8622220
69 7.0 -2.8660817
70 7.1 -2.8710430
71 7.2 -2.8736998
72 7.3 -2.8764701
73 7.4 -2.8818748
74 7.5 -2.8832696
75 7.6 -2.8833351
76 7.7 -2.8891867
77 7.8 -2.8926849
78 7.9 -2.8944987
79 8.0 -2.8996780
80 8.1 -2.9011012
81 8.2 -2.9053911
82 8.3 -2.9063661
83 8.4 -2.9092228
84 8.5 -2.9135426
85 8.6 -2.9101730
86 8.7 -2.9186316
87 8.8 -2.9199631
88 8.9 -2.9199856
89 9.0 -2.9239220
90 9.1 -2.9240167
91 9.2 -2.9284608
92 9.3 -2.9294951
93 9.4 -2.9310985
94 9.5 -2.9352370
95 9.6 -2.9403694
96 9.7 -2.9395336
97 9.8 -2.9404153
98 9.9 -2.9437564
99 10.0 -2.9452175
High x series:
x y
1 2.000000e-01 -0.701301
2 2.517851e-01 -0.907446
3 3.169786e-01 -1.104863
4 3.990525e-01 -1.304556
5 5.023773e-01 -1.496033
6 6.324555e-01 -1.674629
7 7.962143e-01 -1.842118
8 1.002374e+00 -1.998864
9 1.261915e+00 -2.153993
10 1.588656e+00 -2.287607
11 2.000000e+00 -2.415137
12 2.517851e+00 -2.522978
13 3.169786e+00 -2.621386
14 3.990525e+00 -2.701105
15 5.023773e+00 -2.778751
16 6.324555e+00 -2.841699
17 7.962143e+00 -2.900664
18 1.002374e+01 -2.947035
19 1.261915e+01 -2.993301
20 1.588656e+01 -3.033517
21 2.000000e+01 -3.072003
22 2.517851e+01 -3.102536
23 3.169786e+01 -3.138539
24 3.990525e+01 -3.167577
25 5.023773e+01 -3.200739
26 6.324555e+01 -3.233111
27 7.962143e+01 -3.259738
28 1.002374e+02 -3.291657
29 1.261915e+02 -3.324449
30 1.588656e+02 -3.349988
31 2.000000e+02 -3.380031
32 2.517851e+02 -3.405850
33 3.169786e+02 -3.438225
34 3.990525e+02 -3.467420
35 5.023773e+02 -3.496026
36 6.324555e+02 -3.531125
37 7.962143e+02 -3.558215
38 1.002374e+03 -3.587526
39 1.261915e+03 -3.616800
40 1.588656e+03 -3.648891
41 2.000000e+03 -3.684342
42 2.517851e+03 -3.716174
43 3.169786e+03 -3.752631
44 3.990525e+03 -3.786956
45 5.023773e+03 -3.819529
46 6.324555e+03 -3.857214
47 7.962143e+03 -3.899199
48 1.002374e+04 -3.937206
49 1.261915e+04 -3.968795
50 1.588656e+04 -4.015991
51 2.000000e+04 -4.055811
52 2.517851e+04 -4.098894
53 3.169786e+04 -4.135608
54 3.990525e+04 -4.190248
55 5.023773e+04 -4.237104
56 6.324555e+04 -4.286103
57 7.962143e+04 -4.332090
58 1.002374e+05 -4.392748
59 1.261915e+05 -4.446233
60 1.588656e+05 -4.497845
61 2.000000e+05 -4.568541
62 2.517851e+05 -4.628460
63 3.169786e+05 -4.686546
64 3.990525e+05 -4.759202
65 5.023773e+05 -4.826938
66 6.324555e+05 -4.912130
67 7.962143e+05 -4.985855
68 1.002374e+06 -5.070668
69 1.261915e+06 -5.143341
70 1.588656e+06 -5.261585
71 2.000000e+06 -5.343636
72 2.517851e+06 -5.447189
73 3.169786e+06 -5.559962
74 3.990525e+06 -5.683828
75 5.023773e+06 -5.799319
76 6.324555e+06 -5.929599
77 7.962143e+06 -6.065907
78 1.002374e+07 -6.200967
79 1.261915e+07 -6.361633
80 1.588656e+07 -6.509538
81 2.000000e+07 -6.682960
82 2.517851e+07 -6.887793
83 3.169786e+07 -7.026138
84 3.990525e+07 -7.227990
85 5.023773e+07 -7.413960
86 6.324555e+07 -7.620247
87 7.962143e+07 -7.815754
88 1.002374e+08 -8.020447
89 1.261915e+08 -8.229911
90 1.588656e+08 -8.447927
91 2.000000e+08 -8.665613
Upvotes: 3
Views: 6178
Reputation: 269431
Regressing x/y vs. x Plotting y
vs. x
for the low data and playing around a bit it seems that x/y
is approximately linear in x
so try regressing x/y
against x
which gives us a relationship based on only two parameters:
y = x / (a + b * x)
where a and b are the regression coefficients.
> lm(x / y ~ x, lo.data)
Call:
lm(formula = x/y ~ x, data = lo.data)
Coefficients:
(Intercept) x
-0.1877 -0.3216
MM.2 The above can be transformed into the MM.2 model in the drc R package. As seen below this model has a high R2. Also, we calculate the AIC which we can use to compare to other models (lower is better):
> library(drc)
> fm.mm2 <- drm(y ~ x, data = lo.data, fct = MM.2())
> cor(fitted(fm.mm2), lo.data$y)^2
[1] 0.9986303
> AIC(fm.mm2)
[1] -535.7969
CRS.6 This suggests we try a few other drc models and of the ones we tried CRS.6 has a particularly low AIC and seems to fit well visually:
> fm.crs6 <- drm(y ~ x, data = lo.data, fct = CRS.6())
> AIC(fm.crs6)
[1] -942.7866
> plot(fm.crs6) # see output below
This gives us a range of models we can use from the 2 parameter MM.2
model which is not as good as a fit (according to AIC) as the CRS.6 but still fits quite well and has the advantage of only two parameters or the 6 parameter CRS.6
model with its superior AIC. Note that AIC already penalizes models for having more parameters so having a better AIC is not a consequence of having more parameters.
Other If its believed that both low and high should have the same model form then finding a single model form fitting both low and high well might be used as another criterion for picking a model form. In addition to the drc models, there are also some yield-density models in (2.1), (2.2), (2.3) and (2.4) of Akbar et al, IRJFE, 2010 which look similar to the MM.2 model which could be tried.
UPDATED: reworked this around the drc package.
Upvotes: 8
Reputation: 94172
Without an idea of the underlying process you may as well just fit a polynomial with as many components as you like. You don't seem to be testing a hypothesis (eg, gravitational strength is inverse-square related with distance) so you can fish all you like for functional forms, the data is unlikely to tell you which one is 'right'.
So if I read your data into a data frame with x and y components I can do:
data$lx=log(data$x)
plot(data$lx,data$y) # needs at least a cubic polynomial
m1 = lm(y~poly(lx,3),data=data) # fit a cubic
points(data$lx,fitted(m1),pch=19)
and the fitted points are pretty close. Change the polynomial degree from 3 to 7 and the points are identical. Does that mean that your Y values are really coming from a 7-degree polynomial of your X values? No. But you've got a curve that goes through the points.
At this scale, you may as well just join adjacent points up with a straight line, your plot is so smooth. But without underlying theory of why Y depends on X (like an inverse square law, or exponential growth, or something) all you are doing is joining the dots, and there are infinite ways of doing that.
Upvotes: 9