Aakash Dusane
Aakash Dusane

Reputation: 398

How to change values in specific rows depending on given condition in pandas?

I want to do something like this:

for row in df:
if row['Country'] == 'unknown':
    row['Country'] = city2country_mapping[row['city']]

Country and City are columns.

'city2country_mapping' is a dictionary where key:value pair is 'city':'country'.

(basically i'm trying to fill in the unknowns by getting country from dictionary as i know city for each row)

Upvotes: 2

Views: 2351

Answers (3)

ajrwhite
ajrwhite

Reputation: 8458

Editing specific rows: DataFrame.loc vs. Series.where

The standard option for editing specific rows (a "slice") of a DataFrame object is .loc.

The accepted answer uses a neat application of pandas.Series.where to rewrite the df.Country Series, which is more succinct for this specific task.

Recoding values: .apply vs. .map

You can use .map() to recode a Series directly with a dictionary - no need to .apply() a lambda function.

Example

# Example data
df = pd.DataFrame({'Country': ['unknown', 'USA', 'unknown', 'UK', 'USA', 'unknown'],
                   'City': ['London', 'New York', 'New York', 'London', 'New York', 'Paris']
                  })
city2country_mapping = {'London': 'UK', 'New York': 'USA', 'Paris': 'France'}

# print(df)

#    Country      City
# 0  unknown    London
# 1      USA  New York
# 2  unknown  New York
# 3       UK    London
# 4      USA  New York
# 5  unknown     Paris

df.loc[df.Country == 'unknown', 'Country'] = df[df.Country == 'unknown'].City.map(city2country_mapping)
print(df)

Output:

  Country      City
0      UK    London
1     USA  New York
2     USA  New York
3      UK    London
4     USA  New York
5  France     Paris

Upvotes: 1

Teoretic
Teoretic

Reputation: 2533

You can do this using apply:

df['Country'] = df.apply(lambda row: city2country_mapping[row['city']] 
                                     if row['Country'] == 'unknown' else row['Country'], axis=1)

Lambda returns city from mapping in case of 'unknown' country and otherwise just a country in this row.

Upvotes: 2

akuiper
akuiper

Reputation: 215047

You can vectorize this with pandas.Series.where:

df['country'] = df.country.where(
    df.country != 'unknown', df.city.map(city2country_mapping))

df.city.map(city2country_mapping) will first create a Series that contains the corresponding country for each city, and then use this to replace the unknown countries in the country column.

Upvotes: 2

Related Questions