Reputation:
I have a dataframe which has the words Due Date written differently but it all means the same. The problem is in my master data(xls file), one due date has an extra space or doesnt and i cant change that.All i can change is my final output.
Sr no Due Date Due Date DueDate
1 1/2/22
2 1/5/22
3
4
5 ASAP
I just want that column 2 and 3 all combine under column one at the same location they were
Sr No. Due Date
1 1/2/22
2 1/5/22
3
4
5 ASAP
Upvotes: 1
Views: 1004
Reputation: 1690
Possible solution is the following:
import pandas as pd
# set test data
data = {"Sr no": [1,2,3,4,5],
"Due Date": ["1/2/22", "", "", "", ""],
"Due Date ": ["", "1/2/22", "", "", ""],
" Due Date": ["", "", "", "", "ASAP"]
}
# create pandas dataframe
df = pd.DataFrame(data)
# clean up column names
df.columns = [col.strip() for col in df.columns]
# group data
df = df.groupby(df.columns, axis=1).agg(lambda x: x.apply(lambda y: ''.join([str(l) for l in y if str(l) != "nan"]), axis=1))
# reorder column
df = df[['Sr no', 'Due Date']]
df
Returns
Upvotes: 0
Reputation: 260455
You can use filter
with a regex to get similar names, then bfill
and get the first. Finally join to original devoid of the found columns:
d = df.filter(regex=r'(?i)due\s*date')
df2 = (df
.drop(columns=list(d.columns))
.join(d.bfill(1).iloc[:,0])
)
Output:
Sr no Due Date
0 1 1/2/22
1 2 1/5/22
2 3 None
3 4 None
4 5 ASAP
Upvotes: 1