int_x
int_x

Reputation: 29

How to write a Function in python pandas to append the rows in dataframe in a loop?

I am being provided with a data set and i am writing a function. my objectice is quiet simple. I have a air bnb data base with various columns my onjective is simple. I am using a for loop over neighbourhood group list (that i created) and i am trying to extract (append) the data related to that particular element in a empty dataframe.

Example:

import pandas as pd
import numpy as np

dict1 = {'id' : [2539,2595,3647,3831,12937,18198,258838,258876,267535,385824],'name':['Clean & quiet apt home by the park','Skylit Midtown Castle','THE VILLAGE OF HARLEM....NEW YORK !','Cozy Entire Floor of Brownstone','1 Stop fr. Manhattan! Private Suite,Landmark Block','Little King of Queens','Oceanview,close to Manhattan','Affordable rooms,all transportation','Home Away From Home-Room in Bronx','New York City- Riverdale Modern two bedrooms unit'],'price':[149,225,150,89,130,70,250,50,50,120],'neighbourhood_group':['Brooklyn','Manhattan','Manhattan','Brooklyn','Queens','Queens','Staten Island','Staten Island','Bronx','Bronx']}


df = pd.DataFrame(dict1)
df

I created a function as follows


nbd_grp = ['Bronx','Queens','Staten Islands','Brooklyn','Manhattan']

# Creating a function to find the cheapest place in neighbourhood group

dfdf = pd.DataFrame(columns = ['id','name','price','neighbourhood_group'])

def cheapest_place(neighbourhood_group):
  for elem in nbd_grp:
    data =  df.loc[df['neighbourhood_group']==elem]
    cheapest = data.loc[data['price']==min(data['price'])]
    dfdf = cheapest.copy()
cheapest_place(nbd_grp)

My Expected Output is :

id name Price neighbourhood group
267535 Home Away From Home-Room in Bronx 50 Bronx
18198 Little King of Queens 70 Queens
258876 Affordable rooms,all transportation 50 Staten Island
3831 Cozy Entire Floor of Brownstone 89 Brooklyn
3647 THE VILLAGE OF HARLEM....NEW YORK ! 150 Manhattan

Upvotes: 2

Views: 43

Answers (1)

JNevill
JNevill

Reputation: 50034

My advice is that anytime you are working in a database or in a dataframe and you think "I need to loop", you should think again.

When in a dataframe you are in a world of set-based logic and there is likely a better set-based way of solving the problem. In your case you can groupby() your neighbourhood_group and get the min() of the price column and then merge or join that result set back to your original dataframe to get your id and name columns.

That would look something like:

df_min_price = df.groupby('neighbourhood_group').price.agg(min).reset_index().merge(df, on=['neighbourhood_group','price'])

+-----+---------------------+-------+--------+-------------------------------------+
| idx | neighbourhood_group | price |   id   |                name                 |
+-----+---------------------+-------+--------+-------------------------------------+
|   0 | Bronx               |    50 | 267535 | Home Away From Home-Room in Bronx   |
|   1 | Brooklyn            |    89 |   3831 | Cozy Entire Floor of Brownstone     |
|   2 | Manhattan           |   150 |   3647 | THE VILLAGE OF HARLEM....NEW YORK ! |
|   3 | Queens              |    70 |  18198 | Little King of Queens               |
|   4 | Staten Island       |    50 | 258876 | Affordable rooms,all transportation |
+-----+---------------------+-------+--------+-------------------------------------+

Upvotes: 1

Related Questions