Reputation: 151
I have a pandas dataframe similar to this (EDIT):
Fruits Year Farm A Fruits Year Farm B
Apples 2021 2000 Apples 2022 2200
Apples 2020 1500 Apples 2021 2100
Apples 2019 1200 Apples 2020 1900
Orange 2021 1150 Apples 2019 1800
Orange 2020 1200 Oranges 2022 1200
Orange 2019 1300 Oranges 2021 1100
Oranges 2020 1250
Oranges 2019 1250
I would like to transform this dataframe into something looking like this (EDIT):
Fruits Year Farm A Farm B
Apples 2022 - 2200
Apples 2021 2000 2100
Apples 2020 1500 1900
Apples 2019 1200 1800
Orange 2022 - 1200
Orange 2021 1150 1100
Orange 2020 1200 1250
Orange 2019 1300 1250
Any help please?
NEW EDIT: To answer @Cole question and since I made a hell of a confusion by creating the dataframes above to simplify the question, I paste below the original dataframe. From this original dataframe I would like to merge the columns "Financial KPI.1", "Year.1" with the columns "Financial KPI", "Year" filling with NaN the remaining empty 'cells'. My apologies for the confusion.
Financial KPI Year AMZN (All numbers in thousands) BAC (All numbers in thousands) C (All numbers in thousands) Financial KPI.1 Year.1 MSFT (All numbers in thousands)
0 Total Revenue TTM 502,191,000.00 92,478,000.00 74,307,000.00 Total Revenue TTM 203,075,000.00
1 Total Revenue 12/31/2021 469,822,000.00 89,113,000.00 71,887,000.00 Total Revenue 6/29/2022 198,270,000.00
2 Total Revenue 12/31/2020 386,064,000.00 85,528,000.00 75,494,000.00 Total Revenue 6/29/2021 168,088,000.00
3 Total Revenue 12/31/2019 280,522,000.00 91,244,000.00 75,067,000.00 Total Revenue 6/29/2020 143,015,000.00
4 Net Income from Continuing & Discontinued Operation TTM 11,323,000.00 27,409,000.00 15,505,000.00 Total Revenue 6/29/2019 125,843,000.00
5 Net Income from Continuing & Discontinued Operation 12/31/2021 33,364,000.00 31,978,000.00 21,952,000.00 Net Income from Continuing & Discontinued Operation TTM 69,789,000.00
6 Net Income from Continuing & Discontinued Operation 12/31/2020 21,331,000.00 17,894,000.00 11,047,000.00 Net Income from Continuing & Discontinued Operation 6/29/2022 72,738,000.00
7 Net Income from Continuing & Discontinued Operation 12/31/2019 11,588,000.00 27,430,000.00 19,401,000.00 Net Income from Continuing & Discontinued Operation 6/29/2021 61,271,000.00
8 Normalized Income TTM 12,157,600.00 27,409,000.00 15,734,000.00 Net Income from Continuing & Discontinued Operation 6/29/2020 44,281,000.00
9 Normalized Income 12/31/2021 20,551,997.00 31,978,000.00 21,945,000.00 Net Income from Continuing & Discontinued Operation 6/29/2019 39,240,000.00
10 Normalized Income 12/31/2020 21,331,000.00 17,894,000.00 11,067,000.00 Normalized Income TTM 69,806,290.00
11 Normalized Income 12/31/2019 11,588,000.00 27,430,000.00 19,405,000.00 Normalized Income 6/29/2022 72,447,420.00
12 Basic EPS TTM 1.1 3.16 7.45 Normalized Income 6/29/2021 60,150,420.00
13 Basic EPS 12/31/2021 3.3 3.6 10.8 Normalized Income 6/29/2020 44,257,620.00
14 Basic EPS 12/31/2020 2.13 1.88 5.45 Normalized Income 6/29/2019 38,602,420.00
15 Basic EPS 12/31/2019 1.17 2.77 8.63 Basic EPS TTM 9.29
16 Net_Profit_Margin TTM 2.42091 29.6384 21.17432 Basic EPS 6/29/2022 9.7
17 Net_Profit_Margin 12/31/2021 4.37442 35.88478 30.52708 Basic EPS 6/29/2021 8.12
18 Net_Profit_Margin 12/31/2020 5.52525 20.9218 14.65944 Basic EPS 6/29/2020 5.82
19 Net_Profit_Margin 12/31/2019 4.13087 30.06225 25.85024 Basic EPS 6/29/2019 5.11
20 Price To Earnings TTM 76.36364 10.48101 6.07114 Net_Profit_Margin TTM 34.37463
21 Total Assets TTM NaN NaN NaN Net_Profit_Margin 6/29/2022 36.53978
22 Total Assets 12/31/2021 420,549,000.00 3,169,495,000.00 2,291,413,000.00 Net_Profit_Margin 6/29/2021 35.78508
23 Total Assets 12/31/2020 321,195,000.00 2,819,627,000.00 2,260,090,000.00 Net_Profit_Margin 6/29/2020 30.94614
24 Total Assets 12/31/2019 225,248,000.00 2,434,079,000.00 1,951,158,000.00 Net_Profit_Margin 6/29/2019 30.67506
25 Total Liabilities Net Minority Interest TTM NaN NaN NaN Price To Earnings TTM 25.81485
26 Total Liabilities Net Minority Interest 12/31/2021 282,304,000.00 2,899,429,000.00 2,088,741,000.00 Total Assets TTM NaN
27 Total Liabilities Net Minority Interest 12/31/2020 227,791,000.00 2,546,703,000.00 2,059,890,000.00 Total Assets 6/29/2022 364,840,000.00
28 Total Liabilities Net Minority Interest 12/31/2019 163,188,000.00 2,169,269,000.00 1,757,212,000.00 Total Assets 6/29/2021 333,779,000.00
29 Total Equity Gross Minority Interest TTM NaN NaN NaN Total Assets 6/29/2020 301,311,000.00
30 Total Equity Gross Minority Interest 12/31/2021 138,245,000.00 270,066,000.00 202,672,000.00 Total Assets 6/29/2019 286,556,000.00
31 Total Equity Gross Minority Interest 12/31/2020 93,404,000.00 272,924,000.00 200,200,000.00 Total Liabilities Net Minority Interest TTM NaN
32 Total Equity Gross Minority Interest 12/31/2019 62,060,000.00 264,810,000.00 193,946,000.00 Total Liabilities Net Minority Interest 6/29/2022 198,298,000.00
33 Total Debt TTM NaN NaN NaN Total Liabilities Net Minority Interest 6/29/2021 191,791,000.00
34 Total Debt 12/31/2021 116,395,000.00 303,870,000.00 282,347,000.00 Total Liabilities Net Minority Interest 6/29/2020 183,007,000.00
35 Total Debt 12/31/2020 84,389,000.00 282,255,000.00 301,200,000.00 Total Liabilities Net Minority Interest 6/29/2019 184,226,000.00
36 Total Debt 12/31/2019 63,205,000.00 265,060,000.00 293,809,000.00 Total Equity Gross Minority Interest TTM NaN
37 Current_Ratio (assets/liabilities) 12/31/2021 1.4897 1.09314 1.09703 Total Equity Gross Minority Interest 6/29/2022 166,542,000.00
38 Current_Ratio (assets/liabilities) 12/31/2020 1.41004 1.10717 1.09719 Total Equity Gross Minority Interest 6/29/2021 141,988,000.00
39 Current_Ratio (assets/liabilities) 12/31/2019 1.3803 1.12207 1.11037 Total Equity Gross Minority Interest 6/29/2020 118,304,000.00
40 Debt_to_Assets_Ratio 12/31/2021 0.27677 0.09587 0.12322 Total Equity Gross Minority Interest 6/29/2019 102,330,000.00
41 Debt_to_Assets_Ratio 12/31/2020 0.26273 0.1001 0.13327 Total Debt TTM NaN
42 Debt_to_Assets_Ratio 12/31/2019 0.2806 0.1089 0.15058 Total Debt 6/29/2022 61,270,000.00
43 NaN NaN NaN NaN NaN Total Debt 6/29/2021 67,775,000.00
44 NaN NaN NaN NaN NaN Total Debt 6/29/2020 70,998,000.00
45 NaN NaN NaN NaN NaN Total Debt 6/29/2019 78,366,000.00
46 NaN NaN NaN NaN NaN Current_Ratio (assets/liabilities) 6/29/2022 1.83986
47 NaN NaN NaN NaN NaN Current_Ratio (assets/liabilities) 6/29/2021 1.74033
48 NaN NaN NaN NaN NaN Current_Ratio (assets/liabilities) 6/29/2020 1.64645
49 NaN NaN NaN NaN NaN Current_Ratio (assets/liabilities) 6/29/2019 1.55546
50 NaN NaN NaN NaN NaN Debt_to_Assets_Ratio 6/29/2022 0.16794
51 NaN NaN NaN NaN NaN Debt_to_Assets_Ratio 6/29/2021 0.20305
52 NaN NaN NaN NaN NaN Debt_to_Assets_Ratio 6/29/2020 0.23563
53 NaN NaN NaN NaN NaN Debt_to_Assets_Ratio 6/29/2019 0.27348
Upvotes: 1
Views: 112
Reputation: 11255
If the data matches the original structure, we could utilize pands.wide_to_long()
to unpivot our data. Then, to match the exact requirements of OP, we can then pivot()
to wide.
Note as @Laurent B. indicated, the data structure of OP is incorrect - we cannot have the same column name twice in a DataFrame
.
import pandas as pd
df = pd.DataFrame({"Fruits": ["Apple", "Apple", "Apple"],
"Year1": [2021, 2020, 2019],
"Farm1": [2000, 1500, 1200],
"Year2": [2022, 2021, 2019],
"Farm2": [2200, 2100, 1900]})
df["id"] = df.index
molten_df = pd.wide_to_long(df, ["Farm", "Year"], i = "id", j = "FarmId").reset_index()
formatted_df = molten_df.pivot(index = ["Fruits", "Year"], columns = "FarmId", values = "Farm").reset_index()
print(formatted_df)
## FarmId Fruits Year 1 2
## 0 Apple 2019 1200.0 1900.0
## 1 Apple 2020 1500.0 NaN
## 2 Apple 2021 2000.0 2100.0
## 3 Apple 2022 NaN 2200.0
Upvotes: 1
Reputation: 2263
You can't have the same name for two columns in one dataframe unless you uses a suffix. So let's say we have :
import pandas as pd
df = pd.DataFrame({"Fruits": ["Apple", "Apple", "Apple"],
"Year_A": [2021, 2020, 2019],
"Farm_A": [2000, 1500, 1200],
"Year_B": [2022, 2021, 2019],
"Farm_B": [2200, 2100, 1900]})
Fruits Year_A Farm_A Year_B Farm_B
0 Apple 2021 2000 2022 2200
1 Apple 2020 1500 2021 2100
2 Apple 2019 1200 2019 1900
Complete code :
import pandas as pd
df = pd.DataFrame({"Fruits": ["Apple", "Apple", "Apple"],
"Year_A": [2021, 2020, 2019],
"Farm_A": [2000, 1500, 1200],
"Year_B": [2022, 2021, 2019],
"Farm_B": [2200, 2100, 1900]})
# Extract two dataframes
df1 = df.loc[:,["Fruits", "Year_A", "Farm_A"]]
df2 = df.loc[:,["Fruits", "Year_B", "Farm_B"]]
# Rename year column and get rid of the suffix
df1.columns = df1.columns.str.replace('Year_A', 'Year')
df2.columns = df2.columns.str.replace('Year_B', 'Year')
# Apply merge on two columns
df3 = df1.merge(df2, on=["Fruits", "Year"], how="outer")
Results :
Fruits Year Farm_A Farm_B
0 Apple 2021 2000.0 2100.0
1 Apple 2020 1500.0 NaN
2 Apple 2019 1200.0 1900.0
3 Apple 2022 NaN 2200.0
Upvotes: 1
Reputation: 6583
If you had two dataframes:
Fruits Year FarmA
Apples 2021 2000
Apples 2020 1500
Apples 2019 1200
Fruits Year FarmB
Apples 2022 2200
Apples 2021 2100
Apples 2019 1900
say, left
and right
, then you could merge those as follows:
out = pd.merge(left, right, on=["Fruits", "Year"], how="outer")
Which looks like:
Fruits Year FarmA FarmB
0 Apples 2021 2000.0 2100.0
1 Apples 2020 1500.0 NaN
2 Apples 2019 1200.0 1900.0
3 Apples 2022 NaN 2200.0
You sort of have two dataframes already joined on Fruits
, which is awkward to work with. You could get the two separate dataframes like:
left = df[["Fruits", "Year1", "FarmA"]].rename(columns={"Year1": "Year"})
right = df[["Fruits", "Year1", "FarmB"]].rename(columns={"Year2": "Year"})
Note that I changed the names of the columns of your original df a bit. (Duplicate "Year", spaces in "Farm X")
Upvotes: 3