I am trying to drop rows based on the empty values in a specific column of excel after joining the data for both excels. I have tried some queries mentioned in stack overflow, but the desire results are not achieved. Kindly support and thanks First Excel File Named: Fileq.xls Contents are below Item Applied Qty Reference Qty Item A 0 Item B 20 18 Item C 0 Item D 1 Item E 2 First Excel File Named: Data.xls Contents are below Item Applied Qty Reference Qty Item G 0 Item H 12 13 Item I 0 Item J 1 25 Item K 2 Desired Results:- Item Applied Qty Reference Qty Item B 20 18 Item H 12 13 Item J 1 25 Code I am trying:- import pandas as pd DATA_DIR = Path.cwd() / r'C:\Python' excel001 = DATA_DIR / 'Fileq.xls' excel002 = DATA_DIR / 'Data.xls' df001 = pd.read_excel(excel001) df002 = pd.read_excel(excel002) values001 = df001 values002 = df002 dataframes = [values001, values002] join = pd.concat(dataframes) #Tried many different combinations but non are working as below new_df= join['Reference Qty'].replace(r'^\s*$', np.nan, inplace=True) print(new_df)

Reputation: 121

Drop a row based on empty/blanks values in specific column in final excel file - Pandas Data frame

I am trying to drop rows based on the empty values in a specific column of excel after joining the data for both excels. I have tried some queries mentioned in stack overflow, but the desire results are not achieved. Kindly support and thanks

First Excel File Named: Fileq.xls Contents are below

Item	Applied Qty	Reference Qty
Item A	0
Item B	20	18
Item C	0
Item D	1
Item E	2

First Excel File Named: Data.xls Contents are below

Item	Applied Qty	Reference Qty
Item G	0
Item H	12	13
Item I	0
Item J	1	25
Item K	2

Desired Results:-

Item	Applied Qty	Reference Qty
Item B	20	18
Item H	12	13
Item J	1	25

Code I am trying:-

import pandas as pd
DATA_DIR = Path.cwd() / r'C:\Python'

excel001 = DATA_DIR / 'Fileq.xls'

excel002 = DATA_DIR / 'Data.xls'

df001 = pd.read_excel(excel001)

df002 = pd.read_excel(excel002)

values001 = df001

values002 = df002

dataframes = [values001, values002]

join = pd.concat(dataframes)

#Tried many different combinations but non are working as below

new_df= join['Reference Qty'].replace(r'^\s*$', np.nan, inplace=True) 

print(new_df)

Upvotes: 3

Answers (3)

Henry Ecker

Reputation: 35676

Very close! Just missing dropna we can pass a dict to replace so as to only replace on the 'Reference Qty' column instead of replacing the entire DataFrame:

new_df = (
    pd.concat([df1, df2])
        .replace({'Reference Qty': r'^\s*$'}, np.nan, regex=True)
        .dropna(subset=['Reference Qty'])
)

new_df:

     Item  Applied Qty Reference Qty
1  Item B           20          18.0
1  Item H           12          13.0
3  Item J            1          25.0

Complete Working Example:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({
    'Item': ['Item A', 'Item B', 'Item C', 'Item D', 'Item E'],
    'Applied Qty': [0, 20, 0, 1, 2],
    'Reference Qty': ['', 18.0, '', '', '']
})

df2 = pd.DataFrame({
    'Item': ['Item G', 'Item H', 'Item I', 'Item J', 'Item K'],
    'Applied Qty': [0, 12, 0, 1, 2],
    'Reference Qty': ['', 13.0, '', 25.0, '']
})

new_df = (
    pd.concat([df1, df2])
        .replace({'Reference Qty': r'^\s*$'}, np.nan, regex=True)
        .dropna(subset=['Reference Qty'])
)

print(new_df)

Upvotes: 4

ffm_nosoup

Reputation: 107

I think the dropna function from pandas is what you are looking for:

Upvotes: 1

Chris C

Reputation: 59

This should work:

df.replace("", np.nan, inplace=True)

df.dropna(subset = ["Reference Qty"], inplace=True)

Upvotes: 2

Drop a row based on empty/blanks values in specific column in final excel file - Pandas Data frame

Answers (3)

Related Questions