Reputation: 1817
So I have a table that is kind of data like this
theta phi x y
0 0 1 2
0 1 2 3
--------------------
90 360 4 5
theta values runs from 0 to 90 and for each theta phi runs from 0 to 360, but some of the phi s are missing here and there in the table and I have to fill those values. I'm trying to use pandas for this job like this,
import pandas as pd
cols=['theta','phi','x','y']
data = pd.read_csv('data.dat', sep=" |\t", header=None,names=cols,engine='python')
def fill_up(i):
df=data[data['theta']==i]
df.set_index('phi',inplace=True)
df= df.reindex(range(0,361)).reset_index()
df=df[cols]
df.interpolate(inplace=True)
return df
df=pd.concat([fill_up(i) for i in xrange(0,91)])
df.to_csv("new.txt",sep=' ', index=False, header=False)
as you can see I'm creating a seperate data frame for each theta values and concatnating them finally. Is there any way to achieve this without creating a different dataframe or achieving this more efficiently? Also should I take care for any memory overflow as the datafile can be several mb?
Upvotes: 2
Views: 514
Reputation: 29635
I think you can do it by setting 'theta' and 'phi' as index with set_index
, then reindex
with pd.MultiIndex.from_product
with all the values of 'theta' and 'phi' you expect, fill nan values with interpolate
and finally reset_index
such as:
new_data = (data.set_index(['theta','phi'])
.reindex(pd.MultiIndex.from_product( [range(91),range(361)],
names=['theta','phi']))
.interpolate().reset_index())
Note the range(91)
and range(361)
to create all the values for theta and phi.
EDIT for blank line, you can do:
new_data = (data.set_index(['theta','phi'])
.reindex(pd.MultiIndex.from_product( [range(91),range(362)],
names=['theta','phi']))
.interpolate().reset_index())
new_data.loc[new_data['phi'] ==361] = ''
Note that the range for phi is range(362)
to add a row that you make "blank" with the next command line, replacing all the rows where phi = 361
by blank
Or, with the new_data
as in the original solution, you can create a blank dataframe with specific indexes that you can concat
to new_data and sort_index
after
blanck_frame = pd.DataFrame(data='', columns=new_data.columns
index=new_data.loc[new_data['phi'] ==360].index+0.5)
new_data = pd.concat([new_data,blanck_frame]).sort_index()
# you can add .reset_index(drop=True) at the end if you want integer indexes
Upvotes: 1