Reputation: 331
I have a question to merge two columns into one in the same dataframe(start_end), also remove null value. I intend to merge 'Start station'
and 'End station'
into 'station'
, and keep 'duration'
according to the new column 'station'
. I have tried pd.merge
, pd.concat
, pd.append
, but I cannot work it out.
dataFrame of Start_end:
Duration End station Start station
14 1407 NaN 14th & V St NW
19 509 NaN 21st & I St NW
20 638 15th & P St NW. NaN
27 1532 NaN Massachusetts Ave & Dupont Circle NW
28 759 NaN Adams Mill & Columbia Rd NW
Expected output:
Duration stations
14 1407 14th & V St NW
19 509 21st & I St NW
20 638 15th & P St NW
27 1532 Massachusetts Ave & Dupont Circle NW
28 759 Adams Mill & Columbia Rd NW
Code i have so far:
#start_end is the dataframe, 'start station', 'end station', 'duration'
start_end = pd.concat([df_start, df_end])
This is what I attempted to:
station = pd.merge([start_end['Start station'],start_end['End station']])
Upvotes: 15
Views: 25758
Reputation: 1401
Using combine_first
. replaces null values in col1 with col2
df["station"] = df["End station"].combine_first(df["Start station"])
df.drop(["End station", "Start station"], 1, inplace=True)
Upvotes: 8
Reputation: 23743
>>> df
Duration End station Start station
0 1407 NaN 14th & V St NW
1 509 NaN 21st & I St NW
2 638 15th & P St NW. NaN
3 1532 NaN Massachusetts Ave & Dupont Circle NW
4 759 NaN Adams Mill & Columbia Rd NW
Give the two columns the same name
>>> df.columns = df.columns.str.replace('.*?station', 'station')
>>> df
Duration station station
0 1407 NaN 14th & V St NW
1 509 NaN 21st & I St NW
2 638 15th & P St NW. NaN
3 1532 NaN Massachusetts Ave & Dupont Circle NW
4 759 NaN Adams Mill & Columbia Rd NW
Stack then unstack.
>>> s = df.stack()
>>> s
0 Duration 1407
station 14th & V St NW
1 Duration 509
station 21st & I St NW
2 Duration 638
station 15th & P St NW.
3 Duration 1532
station Massachusetts Ave & Dupont Circle NW
4 Duration 759
station Adams Mill & Columbia Rd NW
dtype: object
>>> df = s.unstack()
>>> df
Duration station
0 1407 14th & V St NW
1 509 21st & I St NW
2 638 15th & P St NW.
3 1532 Massachusetts Ave & Dupont Circle NW
4 759 Adams Mill & Columbia Rd NW
>>>
This is how I think this works:
.stack
creates a series with a MultiIndex and takes care of the null values for you. It aligns the second level on the column names and because the column names are the same there is only one - unstacking just produces a single column.
That's really just a guess based on the differences between Index's if you don't change the column names.
>>> # without changing column names
>>> s.index
MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'End station', 'Start station']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 2, 0, 2, 0, 1, 0, 2, 0, 2]])
>>> # column names the same
>>> s.index
MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'station']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]])
Seems a bit tricky, maybe someone will comment on it.
Alternative - Using pd.concat
and .dropna
>>> stations = pd.concat([df.iloc[:,1],df.iloc[:,2]]).dropna()
>>> stations.name = 'stations'
>>> stations
2 15th & P St NW.
0 14th & V St NW
1 21st & I St NW
3 Massachusetts Ave & Dupont Circle NW
4 Adams Mill & Columbia Rd NW
Name: stations, dtype: object
>>> df2 = pd.concat([df['Duration'], stations], axis=1)
>>> df2
Duration stations
0 1407 14th & V St NW
1 509 21st & I St NW
2 638 15th & P St NW.
3 1532 Massachusetts Ave & Dupont Circle NW
4 759 Adams Mill & Columbia Rd NW
Upvotes: 5
Reputation: 294218
fillna
If NaN
are truly nulls
df.assign(**{
'Start station': df['Start station'].fillna(df['End station'])})
Duration End station Start station
14 1407 NaN 14th & V St NW
19 509 NaN 21st & I St NW
20 638 15th & P St NW. 15th & P St NW.
27 1532 NaN Massachusetts Ave & Dupont Circle NW
28 759 NaN Adams Mill & Columbia Rd NW
mask
If NaN
are strings
df.assign(**{
'Start station': df['Start station'].mask(
lambda x: x == 'NaN', df['End station'])})
Duration End station Start station
14 1407 NaN 14th & V St NW
19 509 NaN 21st & I St NW
20 638 15th & P St NW. 15th & P St NW.
27 1532 NaN Massachusetts Ave & Dupont Circle NW
28 759 NaN Adams Mill & Columbia Rd NW
Upvotes: 16