Reputation: 63
So, I am iterating through a dictionary and taking a bunch of values out as a array - Trying to make a Dataframe with each observation as a separate row.
X1 =[]
for k,v in DF_grp:
date = v['Date'].astype(datetime)
usage = v['Usage'].astype(float)
comm = v['comm'].astype(float)
mdf = pd.DataFrame({'Id' : k[0],'date':date,'usage':usage, 'comm':comm})
mdf['used_ratio'] = ((mdf['used']/mdf['comm']).round(2))*100
ts = pd.Series(mdf['usage'].values, index=mdf['date']).sort_index(ascending=True)
ts2 = pd.Series(mdf['used_ratio'].values, index = mdf['date']).sort_index(ascending=True)
ts2 = ts2.dropna()
data = ts2.values.copy()
if len(data) == 10:
X1 =np.append(X1,data, axis=0)
print(X1)
[0,0,0,0,1,0,0,0,1]
[1,2,3,4,5,6,7,8,9]
[0,5,6,7,8,9,1,2,3]
....
similarly, so the question is how do I capture all these arrays in a single DataFrame so that it looks like below:
[[0,0,0,0,1,0,0,0,1]] --- #row 1 in dataframe
[[1,2,3,4,5,6,7,8,9]] --- #row 2 in dataframe
If the same task can be divided further ? There are more thank 500K arrays in the dataset. Thank You
Upvotes: 3
Views: 19402
Reputation: 3967
Declare an empty dataframe in second line i.e. below X1=[]
with code df = pd.DataFrame()
. Next, inside your IF
statement pass the following after appending values to X1
:
df = pd.concat([df, pd.Series(X1)]).T
Or,
df = pd.DataFrame(np.NaN, index=range(3), columns=range(9))
for i in range(3):
df.iloc[i,:] = np.random.randint(9) # <----- Pass X1 here
df
# 0 1 2 3 4 5 6 7 8
# 0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
# 1 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
# 2 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0
Upvotes: 0
Reputation: 143
I hope below mentioned code helps you:
arr2 = [0,0,0,0,1,0,0,0,1]
arr3 = [1,2,3,4,5,6,7,8,9]
arr4 = [0,5,6,7,8,9,1,2,3]
li = [arr2, arr3, arr4]
pd.DataFrame(data = li, columns= ["c1", "c2", "c3", "c4", "c5","c6", "c7", "c8", "c9"])
You can make it more dynamic by simply creating one temp_arr and appending that array to list. and creating data frame from generated list of arrays. Also, you can add name to columns(shown above) or avoid naming them(just remove column detailing). I hope that solves your problem
Upvotes: 3