Reputation: 145
I have a dataframe with some demographic data and some survey text responses. I want to export each column of response data along with some demographic fields to a different Excel files based on one of the demographic fields. I have code that can do all that. The missing piece is dropping rows with nan when writing to Excel.
I tried creating separate dataframes for each question and dropping the nans there, which worked. Then I wasn't sure how to bring them back together to write to Excel.
# Sample dataframe
df = pd.DataFrame({'ID' : ['1','2','3','4'],
'School': ['School1', 'School1', 'School2', 'School2'],
'Sex': ['M', 'M', 'F', 'F'],
'Q1' : ['Black', np.nan, 'White', 'White'],
'Q2' : ['Good', 'Good', 'Bad', 'Bad'],
'Q3' : ['Up', 'Up', np.nan, 'Down']})
# Create output
output = df[['ID','School','Sex','Q1','Q2','Q3']].groupby('School')
# Loop to write to Excel files
for school, df_ in output:
writer = pd.ExcelWriter(f'school_{school}_tabs.xlsx', engine='xlsxwriter')
df_[['School','Sex','Q1']].to_excel(writer, sheet_name='Q1')
df_[['School','Sex','Q2']].to_excel(writer, sheet_name='Q2')
df_[['School','Sex','Q3']].to_excel(writer, sheet_name='Q3')
writer.save()
The sample code should create two Excel files, one for School1 and one for School2. Each file will have three tabs, one for each question (Q1, Q2, Q3). As you can see Q1 and Q3 have nan values, which get written as blanks to Excel. I don't want those rows to be written to Excel. Obviously those people answered other questions, which I do want written to Excel.
Upvotes: 0
Views: 1470
Reputation: 6835
In your code, you need to use .dropna()
.
Eg: df_.dropna()
You will need to determine how
in the dropna
args.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
Experiment with that argument and you should get what you want.
Upvotes: 1