Code Monkey
Code Monkey

Reputation: 69

How do I merge multiple columns with similar names in a Pandas Dataframe without losing data

I am working with some messy data and I'm trying to figure out how to merge multiple columns with similar information onto one column. For example I have a dataframe that looks like this and i want to know how to condense all three columns into one:

Country ------------State ------ Temp ------ Temperature ------ Degrees

United States -----Kentucky --- $76 ------ 76 -------------------- N/A

United States -----Arizona ----- 92\n ------- N/A ------------------ N/A

United States ----- Michigan -- 45 ----------- 45@ ----------------- 60

Upvotes: 1

Views: 134

Answers (1)

oppressionslayer
oppressionslayer

Reputation: 7222

You can try this, then drop the unwanted columns:

df['combined'] = df.apply(lambda x: list([x['Temp'],
                                        x['Temperature'],
                                        x['Degrees']]),axis=1) 

You can also do something like this if you want them separated with a slash

df.apply(lambda x: x.Temp + ' / ' + x.Temperature + ' / ' + x.Degrees, axis=1)

# or simply

df['combined'] = df.Temp + ' / ' + df.Temperature + ' / ' + df.Degrees

I tested this on some data i have with NaN data and it worked with NaN's, maybe worth a try:

import numpy as np
def combine_with_nan(x):
   try:
      np.isnan(x.Temp)
      Temp = 'NaN'
   except:
      Temp = x.Temp
   try:
      np.isnan(x.Temperature)
      Temperature = 'NaN'
   except:
      Temperature = x.Temperature
   try:
      np.isnan(x.Degrees)
      Degrees = 'NaN'
   except:
      Degrees = x.Degrees
   return Temp + ' / ' + Temperature + ' / ' + Degrees

df.apply(combine_with_nan, axis=1)

Upvotes: 1

Related Questions