Ajinkya
Ajinkya

Reputation: 1887

Split data frame based on conditions

I have the following dataframe:

               0       1       2       3              4
0          1.JPG     NaN     NaN     NaN            NaN
1           2883  2957.0  3412.0  3340.0  miscellaneous
2           3517  3007.0  4062.0  3371.0  miscellaneous
3           5678  3158.0  6299.0  3423.0  miscellaneous
4           1627  3287.0  2149.0  3694.0  miscellaneous
5           2894  3272.0  3421.0  3664.0  miscellaneous
6           3525  3271.0  4064.0  3672.0  miscellaneous
7           4759  3337.0  5321.0  3640.0  miscellaneous
8           6141  3289.0  6664.0  3654.0  miscellaneous
9           1017  3598.0  1539.0  3979.0  miscellaneous
10          1624  3586.0  2155.0  3993.0  miscellaneous
11          2252  3612.0  2777.0  3967.0  miscellaneous
12          3211  3548.0  3735.0  3944.0  miscellaneous
13          6052  3616.0  6572.0  3983.0  miscellaneous
14           691  3911.0  1204.0  4223.0  miscellaneous
15         2.JPG     NaN     NaN     NaN            NaN
16         3.JPG     NaN     NaN     NaN            NaN
17          5384  2841.0  5963.0  3095.0  miscellaneous
18          5985  2797.0  6611.0  3080.0  miscellaneous
19          3512  3012.0  4025.0  3366.0  miscellaneous
20          5085  2974.0  5587.0  3367.0  miscellaneous
21          2593  3224.0  3148.0  3469.0  miscellaneous
22          1044  3630.0  1511.0  3928.0  miscellaneous
23          4764  3619.0  5283.0  3971.0  miscellaneous
24          5103  3613.0  5635.0  3928.0  miscellaneous

I want to split this data frame into multiple csv such that: First csv should be named 1.csv and have all data which is below 1.jpg and so on. For example, exported CSV should be like:

1.csv

2883    2957    3412    3340    miscellaneous
3517    3007    4062    3371    miscellaneous
5678    3158    6299    3423    miscellaneous
1627    3287    2149    3694    miscellaneous
2894    3272    3421    3664    miscellaneous
3525    3271    4064    3672    miscellaneous
4759    3337    5321    3640    miscellaneous
6141    3289    6664    3654    miscellaneous
1017    3598    1539    3979    miscellaneous
1624    3586    2155    3993    miscellaneous
2252    3612    2777    3967    miscellaneous
3211    3548    3735    3944    miscellaneous
6052    3616    6572    3983    miscellaneous
691 3911    1204    4223    miscellaneous

2.csv (this csv should should be blank)

3.csv

5384    2841    5963    3095    miscellaneous
5985    2797    6611    3080    miscellaneous
3512    3012    4025    3366    miscellaneous
5085    2974    5587    3367    miscellaneous
2593    3224    3148    3469    miscellaneous
1044    3630    1511    3928    miscellaneous
4764    3619    5283    3971    miscellaneous
5103    3613    5635    3928    miscellaneous

How do I do this using python and pandas ?

Upvotes: 2

Views: 1227

Answers (2)

Ajinkya
Ajinkya

Reputation: 1887

# program splits one big csv file into individiual image csv 's 
import pandas as pd
import numpy as np
df = pd.read_csv('results.csv', header=None)
#df1 = df.replace(np.nan, '1', regex=True)

print(df)

for n,g in df.assign(grouper = df[0].where(df[1].isnull())
                                      .ffill().astype('category'))\
             .dropna().groupby('grouper'):
    g.drop('grouper', axis=1).to_csv(n+'.csv',float_format="%.0f", header=None, index=False )

This produces the required result

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153510

You can use:

for n,g in df.assign(grouper = df['0'].where(df['1'].isnull())
                                      .ffill().astype('category'))\
             .dropna().groupby('grouper'):
    g.drop('grouper', axis=1).to_csv(n+'.csv', header=None, index=False)

Note: used astype('category') to pickup that group with no records

Output !dir *.JPG.csv

04/30/2018  03:43 PM               657 1.JPG.csv
04/30/2018  03:43 PM                 0 2.JPG.csv
04/30/2018  03:43 PM               376 3.JPG.csv

List contents of 1.jpg.csv

2883,2957.0,3412.0,3340.0,miscellaneous
3517,3007.0,4062.0,3371.0,miscellaneous
5678,3158.0,6299.0,3423.0,miscellaneous
1627,3287.0,2149.0,3694.0,miscellaneous
2894,3272.0,3421.0,3664.0,miscellaneous
3525,3271.0,4064.0,3672.0,miscellaneous
4759,3337.0,5321.0,3640.0,miscellaneous
6141,3289.0,6664.0,3654.0,miscellaneous
1017,3598.0,1539.0,3979.0,miscellaneous
1624,3586.0,2155.0,3993.0,miscellaneous
2252,3612.0,2777.0,3967.0,miscellaneous
3211,3548.0,3735.0,3944.0,miscellaneous
6052,3616.0,6572.0,3983.0,miscellaneous
691,3911.0,1204.0,4223.0,miscellaneous

Upvotes: 2

Related Questions