Reputation: 1499
I'm looking for a way to collapse multiple columns into a new single column. Given the below data I'd like to create a new column called "Day" and populate they day value if available. If a day is not populated I want to return the value of None. Can you help me accomplish this?
df = pd.DataFrame({'Monday': {0: 'Monday', 1: 'None', 2: 'None'},
'Tuesday': {0: 'None', 1: 'None', 2: 'Tuesday'},
'Wednesday': {0: 'None', 1: 'None', 2: 'None'}})
DataFrame
Monday Tuesday Wednesday
0 Monday None None
1 None None None
2 None Tuesday None
New column with desired output:
Day
0 Monday
1 None
2 Tuesday
I tried to use melt but does not accomplish exactly what I am looking for and creates extra rows for each column being collapsed.
My attempt:
df = pd.melt(df, var_name='Day')
Day value
0 Monday Monday
1 Monday None
2 Monday None
3 Monday None
4 Tuesday None
5 Tuesday None
6 Tuesday Tuesday
7 Tuesday None
8 Wednesday None
9 Wednesday None
10 Wednesday None
11 Wednesday None
Upvotes: 1
Views: 2268
Reputation: 12158
A less elegant way of approaching: Iterate over rows, using logic (or comparisons) to find the day per row. Append to a list and then add to the dataframe.
# Initialize empty list
Days = []
for idx, row in df.iterrows():
# assume there is no day
day = None
for col in ['Monday','Tuesday','Wednesday']:
# if there is a value, set value to day
if str(row[col])!='None':
day = row[col]
# append to list
Days.append(day)
# Add list to df
df['Day'] = Days
# Drop unused cols
df.drop(columns = ['Monday','Tuesday','Wednesday'], inplace = True)
print(df)
Day
0 Monday
1 None
2 Tuesday
Upvotes: -1
Reputation: 1251
The max function can help you here, but you need to temporarily replace the 'None' text with '0', like so.
df['newcolumn'] = df.replace('None', '0').max(axis=1).replace('0', 'None')
Upvotes: 2
Reputation: 863611
If need only first non missing value per rows first replace strings None
s to missing values NaN
, then back filling missing values and select first column by position:
df = df.replace('None', np.nan).bfill(axis=1).iloc[:, 0]
print (df)
0 Monday
1 NaN
2 Tuesday
3 NaN
Name: Monday, dtype: object
Details:
print (df.replace('None', np.nan))
Monday Tuesday Wednesday
0 Monday NaN NaN
1 NaN NaN NaN
2 NaN Tuesday NaN
3 NaN NaN NaN
print (df.replace('None', np.nan).bfill(axis=1))
Monday Tuesday Wednesday
0 Monday NaN NaN
1 NaN NaN NaN
2 Tuesday Tuesday NaN
3 NaN NaN NaN
print (df.replace('None', np.nan).bfill(axis=1).iloc[:, 0])
0 Monday
1 NaN
2 Tuesday
3 NaN
Name: Monday, dtype: object
Upvotes: 4