Reputation: 161
I have 15 csv files whose one of the column represents year. Problem is that the year column is named 'year' in some files and 'year_' in the other. So I have two columns that have the same information to me but since each file has only one of the column name(either year or year_), if row 1 has value in 'year', 'year_' has NaN. I want to combine those two columns so that I can get rid of NaN. What is the best way to do this?
Before
year year_
1 NaN 1999
2 2002 NaN
3 2000 NaN
.
.
.
N NaN 2004
I want this to be
After
year
1 1999
2 2002
3 2000
.
.
.
N 2004
Upvotes: 2
Views: 85
Reputation: 25997
Same idea as @Vaishali: you can just sum the year columns; use filter
to select the columns:
df.filter(like='year').sum(axis=1)
Upvotes: 0
Reputation: 4044
You can use combine_first function.
df['YEAR'] = df['year'].combine_first(df['year_'])
where df['year']
will be default and df['year2']
will be used to fill null values.
Upvotes: 4
Reputation: 38415
Given that only one has a valid value, you can simply sum them on axis 1
year_cols = df.columns[df.columns.str.contains('year')]
df['year'] = df[year_cols].sum(1)
Upvotes: 2