Chunxiao Li
Chunxiao Li

Reputation: 51

Python: filling in missing data in an iterative dataset

I have a grid map which has grid blocks of 175*175, so the total number of grid blocks in my map is 30625. Every grid block has the information of its coordinates and the property value (x and y are the coordinates and z is the value).So when I export the property values of this map it saves the information as xyz(shown in the example data set, the example assume a grid blocks 10*10). But only part of the grid blocks have information while the others are null(have no value).So when I export the information it ends up giving me only less than 30625 rows of data(the xyz information of white blocks are missing).My question is how to make up (fill) the absent data using python?

Example data set:

X      Y      Z
1      1      2559.26
2      1      2559.36
3      1      2117.882
4      1      2038.34
5      1      2115.24
6      1      2884.25
7      1      1235.25
8      1      2994.53
9      1      2445.23[enter image description here][1]
10     1      2847.45
1      2      2847.96
2      2      2110.23
3      2      2004.35
6      2      2948.24 <-- Missing data
9      2      2998.34 <-- Missing data
10     2      2394.34

See actual data set here

Missing cells Fig. 1: White cells are missing data

This is the map grid,it has 175*175 grid blocks.The white grid is the missing grid

[2]: https://i.sstatic.net/255MB.png

Upvotes: 0

Views: 585

Answers (2)

hpaulj
hpaulj

Reputation: 231615

If I copy and paste your data sample to a file, I can load it with genfromtxt:

In [1]: data = np.genfromtxt('stack44594239.txt',dtype=None,names=True)
In [2]: data
Out[2]: 
array([( 1, 1,  2559.26 ), ( 2, 1,  2559.36 ), ( 3, 1,  2117.882),
       ( 4, 1,  2038.34 ), ( 5, 1,  2115.24 ), ( 6, 1,  2884.25 ),
       ( 7, 1,  1235.25 ), ( 8, 1,  2994.53 ), ( 9, 1,  2445.23 ),
       (10, 1,  2847.45 ), ( 1, 2,  2847.96 ), ( 2, 2,  2110.23 ),
       ( 3, 2,  2004.35 ), ( 6, 2,  2948.24 ), ( 9, 2,  2998.34 ),
       (10, 2,  2394.34 )], 
      dtype=[('X', '<i4'), ('Y', '<i4'), ('Z', '<f8')])

This is a 1d structured array with 3 fields.

Now make a target array:

In [3]: arr = np.zeros((10,2),float)
In [5]: arr.fill(np.nan)

It is initially filled with 0s, but I can fill with something else such as np.nan (a common float fill value).

THen I can use the data fields to fill in values with one statement:

In [7]: arr[data['X']-1,data['Y']-1] = data['Z']
In [8]: arr
Out[8]: 
array([[ 2559.26 ,  2847.96 ],
       [ 2559.36 ,  2110.23 ],
       [ 2117.882,  2004.35 ],
       [ 2038.34 ,       nan],
       [ 2115.24 ,       nan],
       [ 2884.25 ,  2948.24 ],
       [ 1235.25 ,       nan],
       [ 2994.53 ,       nan],
       [ 2445.23 ,  2998.34 ],
       [ 2847.45 ,  2394.34 ]])

Upvotes: 1

Daniel R. Livingston
Daniel R. Livingston

Reputation: 1229

I don't know how you're going to import the data (Pandas, Numpy, etc.) so I'm not going to assume that.

Let's assume you already have the data stored in a Numpy array called data.

import numpy as np
data = np.array([[1, 1, 2559.26], [2, 1, 2559.36], [3, 1, 2117.882], ...])

# Set dimensions (assumes square grid)
dim = 10

# Missing data value
no_data = -9999

# Assign data to vectors
X = data[:,0].tolist()
Y = data[:,1].tolist()
Z = data[:,2].tolist()

# Iterate over the vector X
# If the modulo 10 of the index (which will cycle over 10),
# falls out of order with the value of X at that index 
# (which also cycles over 10), then insert missing values 
for i in range(1, np.size(X)):
    if (i % dim + 1) != int(X[i]):
        X.insert(i, (i % dim + 1))
        Y.insert(i, Y[i])
        Z.insert(i, no_data)

What this does is check that X is cycling repeatedly over 1-10. If not, it inserts the missing number, sets Y to its current place in its own cycle, and sets Z to an arbitrary no data value.

Upvotes: 1

Related Questions