Reputation: 105

Derive a new pandas column based on a certain value of a row and apply until the next value appears again

In a pandas dataframe string column, I want to derive a new column based on the value of a row until the next value appears again. What is the most efficient way to do this / clean way to do achieve this?

Input Dataframe:

import pandas as pd

df = pd.DataFrame({'neighborhood':['Chicago City', 'Wicker Park', 'Bucktown','Lincoln Park','West Loop','River North','Milwaukee City','Bay View','East Side','South Side','Bronzeville','North Side','New York City','Harlem','Midtown','Chinatown']})

My desired dataframe output would be:

      neighborhood city
0     Chicago City Chicago
1      Wicker Park Chicago
2         Bucktown Chicago
3     Lincoln Park Chicago
4        West Loop Chicago
5      River North Chicago
6   Milwaukee City Milwaukee
7         Bay View Milwaukee
8        East Side Milwaukee
9       South Side Milwaukee
10     Bronzeville Milwaukee
11      North Side Milwaukee
12   New York City New York
13          Harlem New York
14         Midtown New York
15       Chinatown New York

Upvotes: 1

Answers (3)

Alessandro Solbiati

Reputation: 979

you can just map a custom defined function that behaves as intended

city = None
def generate(s):
    global city
    if 'City' in s: city = s.replace('City','')
    return city

df['neighborhood'].map(generate)

this will return the intended output

Upvotes: 0

ALollz

Reputation: 59569

Use .str.extract + ffill

df['city'] = df.neighborhood.str.extract('(.*)\sCity').ffill()

Upvotes: 3

Dennis Lyubyvy

Reputation: 1120

1) If the first column contains 'City', copy it to the second column but cut out the ' City' part

2) Fill NA's with a forward fill method

import numpy as np

df['city'] = np.where(
df.neighborhood.str.contains('City'),
df.neighborhood.str.replace(' City', '', case = False),
None)

Result:

      neighborhood       city
0     Chicago City    Chicago
1      Wicker Park       None
2         Bucktown       None
3     Lincoln Park       None
4        West Loop       None
5      River North       None
6   Milwaukee City  Milwaukee
7         Bay View       None
8        East Side       None
9       South Side       None
10     Bronzeville       None
11      North Side       None
12   New York City   New York
13          Harlem       None
14         Midtown       None
15       Chinatown       None

df['city'] = df['city'].fillna(method = 'ffill')

Result:

      neighborhood       city
0     Chicago City    Chicago
1      Wicker Park    Chicago
2         Bucktown    Chicago
3     Lincoln Park    Chicago
4        West Loop    Chicago
5      River North    Chicago
6   Milwaukee City  Milwaukee
7         Bay View  Milwaukee
8        East Side  Milwaukee
9       South Side  Milwaukee
10     Bronzeville  Milwaukee
11      North Side  Milwaukee
12   New York City   New York
13          Harlem   New York
14         Midtown   New York
15       Chinatown   New York

Upvotes: 3

Derive a new pandas column based on a certain value of a row and apply until the next value appears again

Answers (3)

Related Questions