Mooventh Chiyan
Mooventh Chiyan

Reputation: 29

Imputing the missing values string using a condition(pandas DataFrame)

Kaggle Dataset(working on)- Newyork Airbnb

Created with a raw data code for running better explanation of the issue

`airbnb= pd.read_csv("https://raw.githubusercontent.com/rafagarciac/Airbnb_NYC-Data-Science_Project/master/input/new-york-city-airbnb-open-data/AB_NYC_2019.csv")

airbnb[airbnb["host_name"].isnull()][["host_name","neighbourhood_group"]]

`DataFrame

I would like to fill the null values of "host_name" based on the "neighbourhood_group" column entities. like

if airbnb['host_name'].isnull():
   airbnb["neighbourhood_group"]=="Bronx"
   airbnb["host_name"]= "Vie"

elif:
        airbnb["neighbourhood_group"]=="Manhattan"
        airbnb["host_name"]= "Sonder (NYC)"
    else:
        airbnb["host_name"]= "Michael"

(this is wrong,just to represent the output format i want)

I've tried using if statement but I couldn't apply in a correct way. Could you please me solve this.

Thanks

Upvotes: 0

Views: 2031

Answers (3)

Ivan Z
Ivan Z

Reputation: 128

Pandas has a special method to fill NA values:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

You may create a dict with values for "host_name" field using "neighbourhood_group" values as keys and do this:

host_dict = {'Bronx': 'Vie', 'Manhattan': 'Sonder (NYC)'}     
airbnb['host_name'] = airbnb['host_name'].fillna(value=airbnb[airbnb['host_name'].isna()]['neighbourhood_group'].map(host_dict))
airbnb['host_name'] = airbnb['host_name'].fillna("Michael")

"value" argument here may be a Series of values.

So, first of all, we create a Series with "neighbourhood_group" values which correspond to our missing values by using this part:

neighbourhood_group_series = airbnb[airbnb['host_name'].isna()]['neighbourhood_group'] 

Then using map function together with "host_dict" we get a Series with values that we want to impute:

neighbourhood_group_series.map(host_dict)

Finally we just impute in all other NA cells some default value, in our case "Michael".

Upvotes: 2

Kavin Dsouza
Kavin Dsouza

Reputation: 989

You could try this -

airbnb.loc[(airbnb['host_name'].isnull()) & (airbnb["neighbourhood_group"]=="Bronx"), "host_name"] = "Vie"
airbnb.loc[(airbnb['host_name'].isnull()) & (airbnb["neighbourhood_group"]=="Manhattan"), "host_name"] = "Sonder (NYC)"
airbnb.loc[airbnb['host_name'].isnull(), "host_name"] = "Michael"

Upvotes: 3

Sezer BOZKIR
Sezer BOZKIR

Reputation: 562

You can do it with:


ornek = pd.DataFrame({'samp1': [None, None, None],
                     'samp2': ["sezer", "bozkir", "farkli"]})

def filter_by_col(row):
    if row["samp2"] == "sezer":
        return "ping"
    if row["samp2"] == "bozkir":
        return "pong"
    return None

ornek.apply(lambda x: filter_by_col(x), axis=1)

Upvotes: 0

Related Questions