Reputation: 65
I have a dataframe and want to split the dataframe into two based on multiple columns.
df should have all rows without null column and status yes. Rest should be on df_null
df = vehicle.csv
Status Country City Year
Yes USA New York 2001
Yes Canada 2001
Yes France Paris
No Rio 1843
No Germany Berlin 2008
Yes 2004
# df_null has all the rows with null in the three columns
df_null = df[~df[['Country', 'City', 'Year']].notnull().all(1)]
# df has all columns with not null and status = yes
df = df[df[['Country', 'City', 'Year']].notnull().all(1)]
df = df.loc[df['Status'] == 'Yes']
result = pd.concat([df, df_null])
Row with Germany isnt on result dataframe because its filtered out by Status = Yes
.
Upvotes: 0
Views: 1376
Reputation: 161
if your problem statement is to split the dataframe based on Null values then simply use below code.
DF_null = processed_records_DF[processed_records_DF['ColumnName'].isnull()]
DF_not_null = processed_records_DF[processed_records_DF['ColumnName'].notnull()]
Upvotes: 0
Reputation: 81
You can do this by making a binary mask with the code below:
# Import Data
df = pd.DataFrame(
[
["Yes", "USA", "New York", 2001],
["Yes", "Canada", None, 2001],
["Yes", "France", "Paris", None],
["No", None, "Rio", 1843],
["No", "Germany", "Berlin", 2008],
["Yes", None, None, 2004],
],
columns=["Status", "Country", "City", "Year"],
)
# Create Mask
valid_rows = (df[["Country", "City", "Year"]].notnull().all(1)) & (df["Status"] == "Yes")
df_null = df[~valid_rows] # Filter by inverse of mask
df = df[valid_rows] # Filter by mask
This outputs for df as:
Status | Country | City | Year | |
---|---|---|---|---|
0 | Yes | USA | New York | 2001 |
And for df_null as:
Status | Country | City | Year | |
---|---|---|---|---|
1 | Yes | Canada | 2001 | |
2 | Yes | France | Paris | nan |
3 | No | Rio | 1843 | |
4 | No | Germany | Berlin | 2008 |
5 | Yes | 2004 |
Upvotes: 1
Reputation: 846
Is this what you are looking for ?
# Import pandas library
import pandas as pd
import numpy as np
# initialize list of lists
data = [['Yes', 'USA', 'New York' ,2001 ],['Yes', 'Canada','',2001 ], ['Yes', 'France', 'Paris' ,'' ], ['No','' , 'Rio' ,1843 ],['No', 'Germany', 'Berlin' ,2008 ],['Yes', '', '' ,2004 ]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ["Status","Country","City","Year"])
# Adding filter conditions.
df_new = df.replace('', np.nan)
df_new = df_new[df_new.Status == 'Yes'].dropna()
df_null =df[(~df.isin(df_new))].dropna()
# Printing the two dataframes
print(df_new)
print(df_null)
Upvotes: 0