Reputation: 161

Combine two columns of pandas dataframe

I have 15 csv files whose one of the column represents year. Problem is that the year column is named 'year' in some files and 'year_' in the other. So I have two columns that have the same information to me but since each file has only one of the column name(either year or year_), if row 1 has value in 'year', 'year_' has NaN. I want to combine those two columns so that I can get rid of NaN. What is the best way to do this?

Before

       year     year_
 1     NaN      1999
 2     2002     NaN
 3     2000     NaN
 .
 .
 .
 N     NaN      2004

I want this to be

After

Upvotes: 2

Answers (3)

Cleb

Reputation: 26069

Same idea as @Vaishali: you can just sum the year columns; use filter to select the columns:

df.filter(like='year').sum(axis=1)

Upvotes: 0

Spandan Brahmbhatt

Reputation: 4064

You can use combine_first function.

df['YEAR'] = df['year'].combine_first(df['year_'])

where df['year'] will be default and df['year2'] will be used to fill null values.

Upvotes: 4

Vaishali

Reputation: 38425

Given that only one has a valid value, you can simply sum them on axis 1

year_cols = df.columns[df.columns.str.contains('year')]
df['year'] = df[year_cols].sum(1)

Upvotes: 2

Combine two columns of pandas dataframe

Answers (3)

Related Questions