Reputation: 403
I want to use machine learning techniques to categorise "images" of energy released in an electromagnetic calorimeter, using a keras CNN. In order to import the data I'm using a Pandas DataFrame, however the data isn't formatted in the necessary way.
The calorimeter can be considered a 28x28 crystal square, however the data that I receive only show the energy in crystals that have been triggered, on average about 10-15 crystals per event.
Event X Y Energy
0 22 13 203.49
0 23 12 73.1848
...
...
1 23 16 55.1652
1 24 16 0
1 25 16 20.4953
That means I want to add a layer to the data frame for every crystal (X,Y) that doesn't already have an energy assigned, and assign 0 energy to it.
I've tried the following:
newdf=pd.DataFrame()
for event in range(0,2):#999):
for xi in range(0,28):
for yi in range(0,28):
arr=np.array([event,xi,yi,0])
newdf=newdf.append(pd.DataFrame(arr))
print('newdf = ',newdf)
But the arrays get appended into column data in some strange way.
Can anyone tell me an efficient way of doing this?
Thank you.
Upvotes: 0
Views: 517
Reputation: 30579
First we create a dataframe with a MultiIndex for the all events and crystals and set the Energy to 0. Then we add our dataframe with the same index.
Example:
df = pd.DataFrame({'Event': [0,0], 'X': [1,1], 'Y': [0,2], 'Energy': [203.49,73.1848]})
# Event X Y Energy
#0 0 1 0 203.4900
#1 0 1 2 73.1848
n_crystals = 3 # 28 in your case
n_events = 2
idx = pd.MultiIndex.from_product((range(n_events), range(n_crystals), range(n_crystals)), names=['Event','X','Y'])
newdf = pd.DataFrame(index=idx).assign(Energy=0)
newdf = (newdf + df.set_index(['Event','X','Y'])).fillna(0).reset_index()
Result:
Event X Y Energy
0 0 0 0 0.0000
1 0 0 1 0.0000
2 0 0 2 0.0000
3 0 1 0 203.4900
4 0 1 1 0.0000
5 0 1 2 73.1848
6 0 2 0 0.0000
7 0 2 1 0.0000
8 0 2 2 0.0000
9 1 0 0 0.0000
10 1 0 1 0.0000
11 1 0 2 0.0000
12 1 1 0 0.0000
13 1 1 1 0.0000
14 1 1 2 0.0000
15 1 2 0 0.0000
16 1 2 1 0.0000
17 1 2 2 0.0000
For 28x28 crystals and 1000 events (newdf with 784000 rows), this takes 1.5 s on my machine.
Upvotes: 1
Reputation: 111
Your arr shape is actually (4,) and what you want is an array of (1,4) if I didn't misunderstood. You could doarr=np.array([[event,xi,yi,0]])
to have the good shape.
Upvotes: 1