aza01
aza01

Reputation: 121

Drop a row based on empty/blanks values in specific column in final excel file - Pandas Data frame

I am trying to drop rows based on the empty values in a specific column of excel after joining the data for both excels. I have tried some queries mentioned in stack overflow, but the desire results are not achieved. Kindly support and thanks

First Excel File Named: Fileq.xls Contents are below

Item Applied Qty Reference Qty
Item A 0
Item B 20 18
Item C 0
Item D 1
Item E 2

First Excel File Named: Data.xls Contents are below

Item Applied Qty Reference Qty
Item G 0
Item H 12 13
Item I 0
Item J 1 25
Item K 2

Desired Results:-

Item Applied Qty Reference Qty
Item B 20 18
Item H 12 13
Item J 1 25

Code I am trying:-

import pandas as pd
DATA_DIR = Path.cwd() / r'C:\Python'

excel001 = DATA_DIR / 'Fileq.xls'

excel002 = DATA_DIR / 'Data.xls'

df001 = pd.read_excel(excel001)

df002 = pd.read_excel(excel002)

values001 = df001

values002 = df002

dataframes = [values001, values002]

join = pd.concat(dataframes)

#Tried many different combinations but non are working as below

new_df= join['Reference Qty'].replace(r'^\s*$', np.nan, inplace=True) 

print(new_df)

Upvotes: 3

Views: 427

Answers (3)

Henry Ecker
Henry Ecker

Reputation: 35676

Very close! Just missing dropna we can pass a dict to replace so as to only replace on the 'Reference Qty' column instead of replacing the entire DataFrame:

new_df = (
    pd.concat([df1, df2])
        .replace({'Reference Qty': r'^\s*$'}, np.nan, regex=True)
        .dropna(subset=['Reference Qty'])
)

new_df:

     Item  Applied Qty Reference Qty
1  Item B           20          18.0
1  Item H           12          13.0
3  Item J            1          25.0

Complete Working Example:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({
    'Item': ['Item A', 'Item B', 'Item C', 'Item D', 'Item E'],
    'Applied Qty': [0, 20, 0, 1, 2],
    'Reference Qty': ['', 18.0, '', '', '']
})

df2 = pd.DataFrame({
    'Item': ['Item G', 'Item H', 'Item I', 'Item J', 'Item K'],
    'Applied Qty': [0, 12, 0, 1, 2],
    'Reference Qty': ['', 13.0, '', 25.0, '']
})

new_df = (
    pd.concat([df1, df2])
        .replace({'Reference Qty': r'^\s*$'}, np.nan, regex=True)
        .dropna(subset=['Reference Qty'])
)

print(new_df)

Upvotes: 4

ffm_nosoup
ffm_nosoup

Reputation: 107

I think the dropna function from pandas is what you are looking for:

Upvotes: 1

Chris C
Chris C

Reputation: 59

This should work:

df.replace("", np.nan, inplace=True)

df.dropna(subset = ["Reference Qty"], inplace=True)

Upvotes: 2

Related Questions