Reputation: 11
I have a data set that has over 300k rows. I need to break them up into groups of 10k and export them into a csv file. I'm thinking the easiest way is to make each column 10k rows. I'm working on a test code that's only using 18 rows. I'm trying to loop through the entire set and place every 3 rows in a new column. Somehow I'm totally lost. This is what I have so far and my output:
rows = []
for j in range(len(new_zones)):
for i in (0,3):
rows.append(new_zones)
rows_df = pd.concat(rows, axis=1)
and my output:
Bin ID Dropzone Bin ID Dropzone Bin ID Dropzone Bin ID Dropzone Bin ID Dropzone ... Bin ID Dropzone Bin ID Dropzone Bin ID Dropzone Bin ID Dropzone Bin ID Dropzone
7 P2DP352 dz_02 P2DP352 dz_02 P2DP352 dz_02 P2DP352 dz_02 P2DP352 dz_02 ... P2DP352 dz_02 P2DP352 dz_02 P2DP352 dz_02 P2DP352 dz_02 P2DP352 dz_02
8 P2DP353 dz_02 P2DP353 dz_02 P2DP353 dz_02 P2DP353 dz_02 P2DP353 dz_02 ... P2DP353 dz_02 P2DP353 dz_02 P2DP353 dz_02 P2DP353 dz_02 P2DP353 dz_02
9 P2DP354 dz_02 P2DP354 dz_02 P2DP354 dz_02 P2DP354 dz_02 P2DP354 dz_02 ... P2DP354 dz_02 P2DP354 dz_02 P2DP354 dz_02 P2DP354 dz_02 P2DP354 dz_02
10 P2DP355 dz_02 P2DP355 dz_02 P2DP355 dz_02 P2DP355 dz_02 P2DP355 dz_02 ... P2DP355 dz_02 P2DP355 dz_02 P2DP355 dz_02 P2DP355 dz_02 P2DP355 dz_02
11 P2DP356 dz_03 P2DP356 dz_03 P2DP356 dz_03 P2DP356 dz_03 P2DP356 dz_03 ... P2DP356 dz_03 P2DP356 dz_03 P2DP356 dz_03 P2DP356 dz_03 P2DP356 dz_03
5 rows × 72 columns
I'm wanting 3 rows in each column.
Upvotes: 1
Views: 154
Reputation: 1030
You can try something like that (please notice the path format):
def split_df(df, k_rows, path):
count = 0
df_index = 0
while count <= len(df.index):
if count == 0:
new_df = df.iloc[:k_rows, :]
new_df.to_csv(path+str(df_index)+".csv")
else:
new_df = df.iloc[count:count+k_rows, :]
new_df.to_csv(path+str(df_index)+".csv")
count += k_rows
df_index += 1
split_df(df_test, 3, "./split_csv/df_")
Edit - You can try this:
def split_df(df, k_rows, path):
list_df = []
count = 0
while count <= len(df.index):
if count == 0:
new_df = df.iloc[:k_rows, :]
list_df.append(new_df)
else:
new_df = df.iloc[count:count+k_rows, :]
list_df.append(new_df)
count += k_rows
final_df = pd.concat(list_df, axis=1)
for col in final_df.columns:
final_df = final_df.replace(col, np.nan).apply(lambda x: pd.Series(x.dropna().to_numpy()))
final_df.to_csv(path)
split_df(df_test, 3, "./split_csv/final_df.csv")
Upvotes: 1