Reputation: 69
I am working with some messy data and I'm trying to figure out how to merge multiple columns with similar information onto one column. For example I have a dataframe that looks like this and i want to know how to condense all three columns into one:
Country ------------State ------ Temp ------ Temperature ------ Degrees
United States -----Kentucky --- $76 ------ 76 -------------------- N/A
United States -----Arizona ----- 92\n ------- N/A ------------------ N/A
United States ----- Michigan -- 45 ----------- 45@ ----------------- 60
Upvotes: 1
Views: 134
Reputation: 7222
You can try this, then drop the unwanted columns:
df['combined'] = df.apply(lambda x: list([x['Temp'],
x['Temperature'],
x['Degrees']]),axis=1)
You can also do something like this if you want them separated with a slash
df.apply(lambda x: x.Temp + ' / ' + x.Temperature + ' / ' + x.Degrees, axis=1)
# or simply
df['combined'] = df.Temp + ' / ' + df.Temperature + ' / ' + df.Degrees
I tested this on some data i have with NaN data and it worked with NaN's, maybe worth a try:
import numpy as np
def combine_with_nan(x):
try:
np.isnan(x.Temp)
Temp = 'NaN'
except:
Temp = x.Temp
try:
np.isnan(x.Temperature)
Temperature = 'NaN'
except:
Temperature = x.Temperature
try:
np.isnan(x.Degrees)
Degrees = 'NaN'
except:
Degrees = x.Degrees
return Temp + ' / ' + Temperature + ' / ' + Degrees
df.apply(combine_with_nan, axis=1)
Upvotes: 1