jason m
jason m

Reputation: 6835

How to use scipy griddata when used with dataframe vs array

I have the following dataframe:

    A   B   C
0   2   0.7904  0.278784507354
1   2   0.7904  0.278784507354
2   2   0.7904  0.348480634192
3   2   0.7904  0.348480634192
4   2   0.7904  0.418176761031
5   2   0.7904  0.418176761031
6   2   0.7904  0.487872887869
7   2   0.7904  0.487872887869
8   2   0.7904  0.529690563972
9   2   0.7904  0.529690563972
10  2   0.7904  0.54362978934
11  2   0.7904  0.54362978934
12  2   0.7904  0.557569014708
13  2   0.7904  0.557569014708
14  2   0.7904  0.571508240076
15  2   0.7904  0.571508240076
16  2   0.7904  0.585447465443
17  2   0.7904  0.585447465443
18  2   0.7904  0.592417078127
19  2   0.7904  0.592417078127
20  2   0.7904  0.599386690811
21  2   0.7904  0.599386690811
22  2   0.7904  0.606356303495
23  2   0.7904  0.606356303495
24  2   0.7904  0.613325916179
25  2   0.7904  0.613325916179
26  2   0.7904  0.620295528862
27  2   0.7904  0.620295528862
28  2   0.7904  0.627265141546
29  2   0.7904  0.627265141546
30  2   0.7904  0.63423475423
31  2   0.7904  0.63423475423
32  2   0.7904  0.641204366914
149 2   0.3847  1.04544190258
150 2   0.3847  1.05241151526
151 2   0.4248  1.05241151526
152 2   0.3847  1.05938112794
153 2   0.4248  1.05938112794
154 2   0.3847  1.06635074063
155 2   0.4248  1.06635074063
156 2   0.3847  1.07332035331
157 2   0.4248  1.07332035331
158 2   0.3847  1.08725957868
159 2   0.4248  1.08725957868
235 9   0.6816  0.919988874268
236 9   0.8164  0.926958486952
237 9   0.6608  0.926958486952
238 9   0.64    0.933928099636
239 9   0.7449  0.933928099636
240 9   0.7289  0.940897712319
241 9   0.6764  0.940897712319
242 9   0.7128  0.947867325003
243 9   0.7128  0.947867325003
244 9   0.5883  0.954836937687
245 9   0.6626  0.954836937687
246 9   0.675   0.961806550371
247 9   0.675   0.961806550371
350 16  0.6229  0.933928099636
351 16  0.6641  0.933928099636
352 16  0.7124  0.940897712319
353 16  0.7124  0.940897712319
354 16  0.6814  0.947867325003
355 16  0.6193  0.947867325003
596 23  0.4222  1.15695570552
597 23  0.4928  1.15695570552
598 23  0.4222  1.17089493089
599 23  0.4928  1.17089493089
600 23  0.4928  1.18483415625
709 30  0.5404  1.15695570552
710 30  0.5088  1.17089493089
711 30  0.5439  1.17089493089
712 30  0.4953  1.18483415625
713 30  0.4953  1.18483415625
714 30  0.4953  1.19877338162
715 30  0.4953  1.19877338162
716 30  0.4953  1.21271260699
717 30  0.4953  1.21271260699
718 30  0.4953  1.22665183236
719 30  0.4953  1.22665183236
778 37  0.6862  0.961806550371
799 37  0.5957  1.03150267721
800 37  0.6671  1.03847228989
801 37  0.6085  1.03847228989
802 37  0.5883  1.04544190258
826 37  0.5134  1.18483415625
827 37  0.6135  1.18483415625
874 58  0.769   0.864231972797
875 58  0.7491  0.864231972797
876 58  0.768   0.878171198165
939 58  0.4921  1.32422640993
940 58  0.4921  1.39392253677
941 58  0.4902  1.39392253677
942 58  0.4921  1.46361866361
943 58  0.4902  1.46361866361
944 114 1.1536  0.0696961268385
954 114 1.0766  0.348480634192
955 114 1.1536  0.348480634192
956 114 1.1536  0.418176761031

There are more observations but I needed to truncate due to post size limit.

And I am trying to interpolate on the "grid" using the following:

interp_A = np.array([30,60,90,180])
interp_B = np.array([1.0,1.0,1.0,1.0])
grid_z1 = griddata((data['A'],data['B']), data['C'], (interp_A, interp_B), method='nearest')

And I am getting back:

675     0.6057
895     0.6492
1039    0.6884
1256    0.6996

Given some tests I have done it appears my 30, 60, 90, 180 is being mapped to 675, 895, 1039 and 1256.

If I instead call:

grid_z1 = griddata((data['A'].values,data['B'].values), data['C'].values, (interp_A, interp_B), method='nearest')

I get:

[ 0.54    0.6464  0.6673  0.6772]

Which is the proper way to use this library with pandas data?

Thanks!

Upvotes: 2

Views: 2342

Answers (1)

hpaulj
hpaulj

Reputation: 231355

Using the example from griddata I calculate

grid_z0 = interpolate.griddata(points, values, (grid_x, grid_y), method='nearest')

and made a dataframe:

df = pd.DataFrame({'A':points[:,0], 'B':points[:,1], 'C':values})

with values I get the same interpolation as the original:

grid_z1 = interpolate.griddata((df['A'].values,df['B'].values), df['C'].values, (grid_x, grid_y), method='nearest')
np.allclose(grid_z1,grid_z0)   # True

But if I try to replicate your other approach

grid_z2 = interpolate.griddata((df['A'],df['B']), df['C'], (grid_x, grid_y), method='nearest')

I get an error:

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

The error is from within pandas indexing. It's possible that my dataframe has a different structure than yours.

In [17]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns (total 3 columns):
A    1000 non-null float64
B    1000 non-null float64
C    1000 non-null float64
dtypes: float64(3)
memory usage: 31.2 KB

In any case, passing the column values to the griddata is the correct way. griddata is not designed to handle pandas Series directly. It expects numpy arrays, not objects containing arrays.

Upvotes: 4

Related Questions