Laura
Laura

Reputation: 11

Get a specific number of rows and place in new column pandas

I have a data set that has over 300k rows. I need to break them up into groups of 10k and export them into a csv file. I'm thinking the easiest way is to make each column 10k rows. I'm working on a test code that's only using 18 rows. I'm trying to loop through the entire set and place every 3 rows in a new column. Somehow I'm totally lost. This is what I have so far and my output:

rows = []
for j in range(len(new_zones)):
    for i in (0,3):
        rows.append(new_zones) 
rows_df = pd.concat(rows, axis=1)

and my output:

Bin ID  Dropzone    Bin ID  Dropzone    Bin ID  Dropzone    Bin ID  Dropzone    Bin ID  Dropzone    ... Bin ID  Dropzone    Bin ID  Dropzone    Bin ID  Dropzone    Bin ID  Dropzone    Bin ID  Dropzone
7   P2DP352 dz_02   P2DP352 dz_02   P2DP352 dz_02   P2DP352 dz_02   P2DP352 dz_02   ... P2DP352 dz_02   P2DP352 dz_02   P2DP352 dz_02   P2DP352 dz_02   P2DP352 dz_02
8   P2DP353 dz_02   P2DP353 dz_02   P2DP353 dz_02   P2DP353 dz_02   P2DP353 dz_02   ... P2DP353 dz_02   P2DP353 dz_02   P2DP353 dz_02   P2DP353 dz_02   P2DP353 dz_02
9   P2DP354 dz_02   P2DP354 dz_02   P2DP354 dz_02   P2DP354 dz_02   P2DP354 dz_02   ... P2DP354 dz_02   P2DP354 dz_02   P2DP354 dz_02   P2DP354 dz_02   P2DP354 dz_02
10  P2DP355 dz_02   P2DP355 dz_02   P2DP355 dz_02   P2DP355 dz_02   P2DP355 dz_02   ... P2DP355 dz_02   P2DP355 dz_02   P2DP355 dz_02   P2DP355 dz_02   P2DP355 dz_02
11  P2DP356 dz_03   P2DP356 dz_03   P2DP356 dz_03   P2DP356 dz_03   P2DP356 dz_03   ... P2DP356 dz_03   P2DP356 dz_03   P2DP356 dz_03   P2DP356 dz_03   P2DP356 dz_03
5 rows × 72 columns

I'm wanting 3 rows in each column.

Upvotes: 1

Views: 154

Answers (1)

Tomer S
Tomer S

Reputation: 1030

You can try something like that (please notice the path format):

def split_df(df, k_rows, path):
    count = 0
    df_index = 0
    while count <= len(df.index):
        if count == 0:
            new_df = df.iloc[:k_rows, :]
            new_df.to_csv(path+str(df_index)+".csv")
        else:
            new_df = df.iloc[count:count+k_rows, :]
            new_df.to_csv(path+str(df_index)+".csv")
        count += k_rows
        df_index += 1


split_df(df_test, 3, "./split_csv/df_")

Edit - You can try this:

def split_df(df, k_rows, path):
    list_df = []
    count = 0
    while count <= len(df.index):
        if count == 0:
            new_df = df.iloc[:k_rows, :]
            list_df.append(new_df)
        else:
            new_df = df.iloc[count:count+k_rows, :]
            list_df.append(new_df)
        count += k_rows
    final_df = pd.concat(list_df, axis=1)
    for col in final_df.columns:
        final_df = final_df.replace(col, np.nan).apply(lambda x: pd.Series(x.dropna().to_numpy()))
    final_df.to_csv(path)

split_df(df_test, 3, "./split_csv/final_df.csv")

Upvotes: 1

Related Questions