Reputation: 911
I need help in reshaping a data in csv file that have over 10000 row by 10 each. For example I have this csv file :
Ale Brick
1 ww
2 ee
3 qq
3 xx
5 dd
3 gg
7 hh
8 tt
9 yy
0 uu
1 ii
2 oo
3 pp
4 mm
1 ww
7 zz
1 cc
3 rr
6 tt
9 ll
What I am hoping to get is this form where only data in 'Brick' column will be reshaped.
[['ww' 'ee' 'qq' 'xx' 'dd']
['gg' 'hh' 'tt' 'yy' 'uu']]
[['ii' 'oo' 'pp' 'mm' 'ww']
['zz' 'cc' 'rr' 'tt' 'll']]
I know how to reshape the data from 0 until 9th row only but did not know how to do it for next 10th row. Here is my script :
import pandas as pd
df = pd.read_csv("test.csv")
for i in range(0, len(df)):
slct = df.head(10)
result = slct['Brick'].reshape(2,5)
print result
This script only print the following result
[['ww' 'ee' 'qq' 'xx' 'dd']
['gg' 'hh' 'tt' 'yy' 'uu']]
I was hoping for it to print the data from 0 to 9th row, 10th to 19th row, 20th row to 29th row and so on...
I have been through the pandas tutorial but did not find any example that looks similar to what I want.
Thank you for your help
Upvotes: 2
Views: 1557
Reputation: 1
import pandas as pd
df = pd.read_csv(`"`test.csv`"`)
data = df['Brick']
k=int(len(data)/10)+1
for x in range(k):
temp=data[10*x:10*(x+1)]
print temp.values.reshape(2,5)
Upvotes: 0
Reputation: 61967
You can group by every 10th row and then reshape the values
df.groupby(np.repeat(np.arange(len(df) / 10), 10))['Brick'].apply(lambda x: x.values.reshape(2,5))
0.0 [[ww, ee, qq, xx, dd], [gg, hh, tt, yy, uu]]
1.0 [[ii, oo, pp, mm, ww], [zz, cc, rr, tt, ll]]
Upvotes: 1
Reputation: 10399
You need to make use of the modulo operator to "batch" reshape your column. You're on the right track. You just need another iterator to do the modulo operation.
import pandas as pd
df = pd.DataFrame({'brick': ['xx','yy','xa','bd','ev','bb','oo','pp','qq','bn','nv','bn','rr','qw','bn','cd','fd','bv','nm','ty']})
start = 0 # set start to 0 for slicing
for i in range(len(df.index)):
if (i + 1) % 10 == 0: # the modulo operation
result = df['brick'].iloc[start:i+1].reshape(2,5)
print result
start = i + 1 # set start to next index
Output:
[['xx' 'yy' 'xa' 'bd' 'ev']
['bb' 'oo' 'pp' 'qq' 'bn']]
[['nv' 'bn' 'rr' 'qw' 'bn']
['cd' 'fd' 'bv' 'nm' 'ty']]
Upvotes: 3