MBasith
MBasith

Reputation: 1499

Pandas collapse columns into a single column

I'm looking for a way to collapse multiple columns into a new single column. Given the below data I'd like to create a new column called "Day" and populate they day value if available. If a day is not populated I want to return the value of None. Can you help me accomplish this?

df = pd.DataFrame({'Monday': {0: 'Monday', 1: 'None', 2: 'None'},
                   'Tuesday': {0: 'None', 1: 'None', 2: 'Tuesday'},
                   'Wednesday': {0: 'None', 1: 'None', 2: 'None'}})

DataFrame

   Monday  Tuesday Wednesday
0  Monday     None      None
1    None     None      None
2    None  Tuesday      None

New column with desired output:

        Day
0    Monday
1      None
2   Tuesday

I tried to use melt but does not accomplish exactly what I am looking for and creates extra rows for each column being collapsed.

My attempt:

df = pd.melt(df, var_name='Day')

          Day    value
0      Monday   Monday
1      Monday     None
2      Monday     None
3      Monday     None
4     Tuesday     None
5     Tuesday     None
6     Tuesday  Tuesday
7     Tuesday     None
8   Wednesday     None
9   Wednesday     None
10  Wednesday     None
11  Wednesday     None

Upvotes: 1

Views: 2268

Answers (3)

Yaakov Bressler
Yaakov Bressler

Reputation: 12158

A less elegant way of approaching: Iterate over rows, using logic (or comparisons) to find the day per row. Append to a list and then add to the dataframe.

# Initialize empty list
Days = []
for idx, row in df.iterrows():
  # assume there is no day
  day = None
  for col in ['Monday','Tuesday','Wednesday']:
    # if there is a value, set value to day
    if str(row[col])!='None':
      day = row[col]
  # append to list
  Days.append(day)

# Add list to df
df['Day'] = Days

# Drop unused cols

df.drop(columns = ['Monday','Tuesday','Wednesday'], inplace = True)
print(df)
       Day
0   Monday
1     None
2  Tuesday

Upvotes: -1

Niels Hameleers
Niels Hameleers

Reputation: 1251

The max function can help you here, but you need to temporarily replace the 'None' text with '0', like so.

df['newcolumn'] = df.replace('None', '0').max(axis=1).replace('0', 'None')

Upvotes: 2

jezrael
jezrael

Reputation: 863611

If need only first non missing value per rows first replace strings Nones to missing values NaN, then back filling missing values and select first column by position:

df = df.replace('None', np.nan).bfill(axis=1).iloc[:, 0]
print (df)
0     Monday
1        NaN
2    Tuesday
3        NaN
Name: Monday, dtype: object

Details:

print (df.replace('None', np.nan))
   Monday  Tuesday  Wednesday
0  Monday      NaN        NaN
1     NaN      NaN        NaN
2     NaN  Tuesday        NaN
3     NaN      NaN        NaN

print (df.replace('None', np.nan).bfill(axis=1))
    Monday  Tuesday Wednesday
0   Monday      NaN       NaN
1      NaN      NaN       NaN
2  Tuesday  Tuesday       NaN
3      NaN      NaN       NaN

print (df.replace('None', np.nan).bfill(axis=1).iloc[:, 0])
0     Monday
1        NaN
2    Tuesday
3        NaN
Name: Monday, dtype: object

Upvotes: 4

Related Questions