Reputation: 121
I am trying to drop rows based on the empty values in a specific column of excel after joining the data for both excels. I have tried some queries mentioned in stack overflow, but the desire results are not achieved. Kindly support and thanks
First Excel File Named: Fileq.xls Contents are below
Item | Applied Qty | Reference Qty |
---|---|---|
Item A | 0 | |
Item B | 20 | 18 |
Item C | 0 | |
Item D | 1 | |
Item E | 2 |
First Excel File Named: Data.xls Contents are below
Item | Applied Qty | Reference Qty |
---|---|---|
Item G | 0 | |
Item H | 12 | 13 |
Item I | 0 | |
Item J | 1 | 25 |
Item K | 2 |
Desired Results:-
Item | Applied Qty | Reference Qty |
---|---|---|
Item B | 20 | 18 |
Item H | 12 | 13 |
Item J | 1 | 25 |
Code I am trying:-
import pandas as pd
DATA_DIR = Path.cwd() / r'C:\Python'
excel001 = DATA_DIR / 'Fileq.xls'
excel002 = DATA_DIR / 'Data.xls'
df001 = pd.read_excel(excel001)
df002 = pd.read_excel(excel002)
values001 = df001
values002 = df002
dataframes = [values001, values002]
join = pd.concat(dataframes)
#Tried many different combinations but non are working as below
new_df= join['Reference Qty'].replace(r'^\s*$', np.nan, inplace=True)
print(new_df)
Upvotes: 3
Views: 427
Reputation: 35676
Very close! Just missing dropna
we can pass a dict
to replace
so as to only replace
on the 'Reference Qty'
column instead of replacing the entire DataFrame:
new_df = (
pd.concat([df1, df2])
.replace({'Reference Qty': r'^\s*$'}, np.nan, regex=True)
.dropna(subset=['Reference Qty'])
)
new_df
:
Item Applied Qty Reference Qty
1 Item B 20 18.0
1 Item H 12 13.0
3 Item J 1 25.0
Complete Working Example:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({
'Item': ['Item A', 'Item B', 'Item C', 'Item D', 'Item E'],
'Applied Qty': [0, 20, 0, 1, 2],
'Reference Qty': ['', 18.0, '', '', '']
})
df2 = pd.DataFrame({
'Item': ['Item G', 'Item H', 'Item I', 'Item J', 'Item K'],
'Applied Qty': [0, 12, 0, 1, 2],
'Reference Qty': ['', 13.0, '', 25.0, '']
})
new_df = (
pd.concat([df1, df2])
.replace({'Reference Qty': r'^\s*$'}, np.nan, regex=True)
.dropna(subset=['Reference Qty'])
)
print(new_df)
Upvotes: 4
Reputation: 107
I think the dropna function from pandas is what you are looking for:
Upvotes: 1
Reputation: 59
This should work:
df.replace("", np.nan, inplace=True)
df.dropna(subset = ["Reference Qty"], inplace=True)
Upvotes: 2