Ling
Ling

Reputation: 911

How to reshape every nth row's data using pandas?

I need help in reshaping a data in csv file that have over 10000 row by 10 each. For example I have this csv file :

Ale Brick
1   ww
2   ee
3   qq
3   xx
5   dd
3   gg
7   hh
8   tt
9   yy
0   uu
1   ii
2   oo
3   pp
4   mm
1   ww
7   zz
1   cc
3   rr
6   tt
9   ll

What I am hoping to get is this form where only data in 'Brick' column will be reshaped.

[['ww' 'ee' 'qq' 'xx' 'dd']
 ['gg' 'hh' 'tt' 'yy' 'uu']]

[['ii' 'oo' 'pp' 'mm' 'ww']
 ['zz' 'cc' 'rr' 'tt' 'll']]

I know how to reshape the data from 0 until 9th row only but did not know how to do it for next 10th row. Here is my script :

import pandas as pd

df = pd.read_csv("test.csv")

for i in range(0, len(df)):
    slct = df.head(10)
    result = slct['Brick'].reshape(2,5)

print result

This script only print the following result

[['ww' 'ee' 'qq' 'xx' 'dd']
 ['gg' 'hh' 'tt' 'yy' 'uu']]

I was hoping for it to print the data from 0 to 9th row, 10th to 19th row, 20th row to 29th row and so on...

I have been through the pandas tutorial but did not find any example that looks similar to what I want.

Thank you for your help

Upvotes: 2

Views: 1557

Answers (3)

Paul
Paul

Reputation: 1

import pandas as pd

df = pd.read_csv(`"`test.csv`"`)

data = df['Brick']

k=int(len(data)/10)+1

for x in range(k):

    temp=data[10*x:10*(x+1)]

    print temp.values.reshape(2,5)

Upvotes: 0

Ted Petrou
Ted Petrou

Reputation: 61967

You can group by every 10th row and then reshape the values

df.groupby(np.repeat(np.arange(len(df) / 10), 10))['Brick'].apply(lambda x: x.values.reshape(2,5))

0.0    [[ww, ee, qq, xx, dd], [gg, hh, tt, yy, uu]]
1.0    [[ii, oo, pp, mm, ww], [zz, cc, rr, tt, ll]]

Upvotes: 1

Scratch'N'Purr
Scratch'N'Purr

Reputation: 10399

You need to make use of the modulo operator to "batch" reshape your column. You're on the right track. You just need another iterator to do the modulo operation.

import pandas as pd

df = pd.DataFrame({'brick': ['xx','yy','xa','bd','ev','bb','oo','pp','qq','bn','nv','bn','rr','qw','bn','cd','fd','bv','nm','ty']})

start = 0  # set start to 0 for slicing
for i in range(len(df.index)):
    if (i + 1) % 10 == 0:  # the modulo operation
        result = df['brick'].iloc[start:i+1].reshape(2,5)
        print result
        start = i + 1  # set start to next index

Output:

[['xx' 'yy' 'xa' 'bd' 'ev']
 ['bb' 'oo' 'pp' 'qq' 'bn']]
[['nv' 'bn' 'rr' 'qw' 'bn']
 ['cd' 'fd' 'bv' 'nm' 'ty']]

Upvotes: 3

Related Questions