Reputation: 83
Hello I have an excel file which I need to sort. I want to split section of Data. Each section of Data is Splitted by two newline.
I tried spliting the data over \n\n using str(dfs).split('\n\n')
but it's just making one split which includes all the data. In short no splitting is being made.
Code Up till Now :
import pandas as pd
file_name = "Data of 2020-09-12 .xlsx"
dfs = pd.read_excel(file_name,'Sheet1')
dfs = dfs.fillna('')
dfs = str(dfs).split("\n\n")
print(dfs)
This is the output of Dataframe
Upvotes: 0
Views: 279
Reputation: 1823
Here is one way you can do this. The code is pretty much self explanatory but if there is any confusion, feel free to ask.
The basic idea is that you first iterate the df and find all the index positions of empty rows in the df. Then you split the df on those locations.
The solution assumes a row which has first column value '' to be empty.
import pandas as pd
file_name = "test.xlsx"
df = pd.read_excel(file_name)
df = df.fillna('')
df_temp = df.copy()
#find all section positions. We are doing p+2 since there are 2 empty rows dividing the section
pos = []
while True:
try:
empty = ''
p = df_temp.index[df_temp.iloc[:, 0] == empty][0]
df_temp = df_temp.iloc[p+2:]
pos.append(p)
except:
break
print(pos)
#Generate new dfs by splitting on the positions
list_df = []
start = 0
for p in pos:
print(start,p)
subdf = df.iloc[start:p]
list_df.append(subdf)
start = p+2
subdf = df.iloc[start:]
list_df.append(subdf)
#Print your dfs
for d in list_df:
print(d)
Upvotes: 1