pocketfullofcheese
pocketfullofcheese

Reputation: 8837

how to combine two columns with an if/else in python pandas?

I am very new to Pandas (i.e., less than 2 days). However, I can't seem to figure out the right syntax for combining two columns with an if/else condition.

Actually, I did figure out one way to do it using 'zip'. This is what I want to accomplish, but it seems there might be a more efficient way to do this in pandas.

For completeness sake, I include some pre-processing I do to make things clear:

records_data = pd.read_csv(open('records.csv'))

## pull out a year from column using a regex
source_years = records_data['source'].map(extract_year_from_source) 

## this is what I want to do more efficiently (if its possible)
records_data['year'] = [s if s else y for (s,y) in zip(source_years, records_data['year'])]

Upvotes: 13

Views: 24172

Answers (2)

Jeff
Jeff

Reputation: 128958

In pandas >= 0.10.0 try

df['year'] = df['year'].where(source_years!=0,df['year'])

and see:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking

As noted in the comments, this DOES use np.where under the hood - the difference is that pandas aligns the series with the output (so for example you can only do a partial update)

Upvotes: 18

unutbu
unutbu

Reputation: 879591

Perhaps try np.where:

import numpy as np
df['year'] = np.where(source_years,source_years,df['year'])

Upvotes: 10

Related Questions