Reputation: 93
I have a Pandas DataFrame like this:
A B C Date1 Date2
0 a b c 02/19/2022 02/20/2022
1 aa bb cc 02/26/2022 02/27/2022
I want to combine Date1 and Date2 into one column, and duplicate the other column values that correspond to it. Into a dataframe that looks like this:
A B C Date
0 a b c 02/19/2022
1 a b c 02/20/2022
2 aa bb cc 02/26/2022
3 aa bb cc 02/27/2022
I know how one would do this once this dataframe has been imported into SQL using join, but is there a way to do this directly in the Pandas dataframe before uploading to a database?
Upvotes: 2
Views: 104
Reputation: 13821
Yes, pandas
offers stack()
- You could set the index on your 'A', 'B' and 'C' columns:
df.set_index(['A','B','C']).stack().droplevel(level=3).rename('Date').reset_index()
prints:
A B C Date
0 a b c 02/19/2022
1 a b c 02/20/2022
2 aa bb cc 02/26/2022
3 aa bb cc 02/27/2022
Breakdown:
# Setup
>>> df.to_dict()
{'A': {0: 'a', 1: 'aa'},
'B': {0: 'b', 1: 'bb'},
'C': {0: 'c', 1: 'cc'},
'Date1': {0: '02/19/2022', 1: '02/26/2022'},
'Date2': {0: '02/20/2022', 1: '02/27/2022'}}
>>> print(df)
A B C Date1 Date2
0 a b c 02/19/2022 02/20/2022
1 aa bb cc 02/26/2022 02/27/2022
# 1) Set the index -> A,B,C
a = df.set_index(['A','B','C'])
Date
A B C
a b c 02/19/2022
c 02/20/2022
aa bb cc 02/26/2022
cc 02/27/2022
# 2) Stack the DF -> Since it has one column, it will return a Series
b = a.stack()
A B C
a b c Date 02/19/2022
Date 02/20/2022
aa bb cc Date 02/26/2022
Date 02/27/2022
# 3) Remove the 3rd layer of the index
# Where it says "Date", which is the result of 'Stacking'
# Try dropping another level here
c = b.droplevel(level=3)
A B C
a b c 02/19/2022
c 02/20/2022
aa bb cc 02/26/2022
cc 02/27/2022
# Renaming the three layered series
d = c.rename('Date')
A B C
a b c 02/19/2022
c 02/20/2022
aa bb cc 02/26/2022
cc 02/27/2022
# Converting the index to columns
e = d.reset_index()
A B C Date
0 a b c 02/19/2022
1 a b c 02/20/2022
2 aa bb cc 02/26/2022
3 aa bb cc 02/27/2022
Documentation of the functions used:
Upvotes: 2