Reputation: 4842
I have a df
:
df = pd.DataFrame.from_dict({('group', ''): {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'A',
6: 'A',
7: 'A',
8: 'A',
9: 'B',
10: 'B',
11: 'B',
12: 'B',
13: 'B',
14: 'B',
15: 'B',
16: 'B',
17: 'B',
18: 'all',
19: 'all'},
('category', ''): {0: 'Amazon',
1: 'Apple',
2: 'Facebook',
3: 'Google',
4: 'Netflix',
5: 'Tesla',
6: 'Total',
7: 'Uber',
8: 'total',
9: 'Amazon',
10: 'Apple',
11: 'Facebook',
12: 'Google',
13: 'Netflix',
14: 'Tesla',
15: 'Total',
16: 'Uber',
17: 'total',
18: 'Total',
19: 'total'},
(pd.Timestamp('2020-06-29'), 'last_sales'): {0: 195.0,
1: 61.0,
2: 106.0,
3: 61.0,
4: 37.0,
5: 13.0,
6: 954.0,
7: 4.0,
8: 477.0,
9: 50.0,
10: 50.0,
11: 75.0,
12: 43.0,
13: 17.0,
14: 14.0,
15: 504.0,
16: 3.0,
17: 252.0,
18: 2916.0,
19: 2916.0},
(pd.Timestamp('2020-06-29'), 'sales'): {0: 1268.85,
1: 18274.385000000002,
2: 19722.65,
3: 55547.255,
4: 15323.800000000001,
5: 1688.6749999999997,
6: 227463.23,
7: 1906.0,
8: 113731.615,
9: 3219.6499999999996,
10: 15852.060000000001,
11: 17743.7,
12: 37795.15,
13: 5918.5,
14: 1708.75,
15: 166349.64,
16: 937.01,
17: 83174.82,
18: 787625.7400000001,
19: 787625.7400000001},
(pd.Timestamp('2020-06-29'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2020-07-06'), 'last_sales'): {0: 26.0,
1: 39.0,
2: 79.0,
3: 49.0,
4: 10.0,
5: 10.0,
6: 436.0,
7: 5.0,
8: 218.0,
9: 89.0,
10: 34.0,
11: 133.0,
12: 66.0,
13: 21.0,
14: 20.0,
15: 732.0,
16: 3.0,
17: 366.0,
18: 2336.0,
19: 2336.0},
(pd.Timestamp('2020-07-06'), 'sales'): {0: 3978.15,
1: 12138.96,
2: 19084.175,
3: 40033.46000000001,
4: 4280.15,
5: 1495.1,
6: 165548.29,
7: 1764.15,
8: 82774.145,
9: 8314.92,
10: 12776.649999999996,
11: 28048.075,
12: 55104.21000000002,
13: 6962.844999999999,
14: 3053.2000000000003,
15: 231049.11000000002,
16: 1264.655,
17: 115524.55500000001,
18: 793194.8000000002,
19: 793194.8000000002},
(pd.Timestamp('2020-07-06'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-06-28'), 'last_sales'): {0: 96.0,
1: 56.0,
2: 106.0,
3: 44.0,
4: 34.0,
5: 13.0,
6: 716.0,
7: 9.0,
8: 358.0,
9: 101.0,
10: 22.0,
11: 120.0,
12: 40.0,
13: 13.0,
14: 8.0,
15: 610.0,
16: 1.0,
17: 305.0,
18: 2652.0,
19: 2652.0},
(pd.Timestamp('2021-06-28'), 'sales'): {0: 5194.95,
1: 19102.219999999994,
2: 22796.420000000002,
3: 30853.115,
4: 11461.25,
5: 992.6,
6: 188143.41,
7: 3671.15,
8: 94071.705,
9: 6022.299999999998,
10: 7373.6,
11: 33514.0,
12: 35943.45,
13: 4749.000000000001,
14: 902.01,
15: 177707.32,
16: 349.3,
17: 88853.66,
18: 731701.46,
19: 731701.46},
(pd.Timestamp('2021-06-28'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-07-07'), 'last_sales'): {0: 45.0,
1: 47.0,
2: 87.0,
3: 45.0,
4: 13.0,
5: 8.0,
6: 494.0,
7: 2.0,
8: 247.0,
9: 81.0,
10: 36.0,
11: 143.0,
12: 56.0,
13: 9.0,
14: 9.0,
15: 670.0,
16: 1.0,
17: 335.0,
18: 2328.0,
19: 2328.0},
(pd.Timestamp('2021-07-07'), 'sales'): {0: 7556.414999999998,
1: 14985.05,
2: 16790.899999999998,
3: 36202.729999999996,
4: 4024.97,
5: 1034.45,
6: 163960.32999999996,
7: 1385.65,
8: 81980.16499999998,
9: 5600.544999999999,
10: 11209.92,
11: 32832.61,
12: 42137.44500000001,
13: 3885.1499999999996,
14: 1191.5,
15: 194912.34000000003,
16: 599.0,
17: 97456.17000000001,
18: 717745.3400000001,
19: 717745.3400000001},
(pd.Timestamp('2021-07-07'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0}}).set_index(['group','category'])
I am trying to sort of flatten it so it would no longer be a multiindex df
. As there are several dates I try to select one:
df.loc[:,'2020-06-29 00:00:00']
But this gives me an error :
KeyError: '2020-06-29 00:00:00'
I am trying to make it that the first week ( and my final output ) of 2020-06-29
would look like this :
group category last_sales sales difference
A Amazon 195.00 1,268.85 0.00
A Apple 61.00 18,274.39 0.00
A Facebook 106.00 19,722.65 0.00
A Google 61.00 55,547.25 0.00
A Netflix 37.00 15,323.80 0.00
A Tesla 13.00 1,688.67 0.00
A Total 954.00 227,463.23 0.00
A Uber 4.00 1,906.00 0.00
A total 477.00 113,731.62 0.00
B Amazon 0.00 3,219.65 0.00
B Apple 50.00 15,852.06 0.00
B Facebook 75.00 17,743.70 0.00
B Google 43.00 37,795.15 0.00
B Netflix 17.00 5,918.50 0.00
B Tesla 14.00 1,708.75 0.00
B Total 504.00 166,349.64 0.00
B Uber 3.00 937.01 0.00
B total 252.00 83,174.82 0.00
all Total 2,916.00 787,625.74 0.00
Upvotes: 2
Views: 76
Reputation: 24314
try via pd.to_dateime()
:
out=df.loc[:,pd.to_datetime('2020-06-29 00:00:00')]
#out=df.loc[:,pd.to_datetime('2020-06-29 00:00:00')].reset_index()
OR
try via pd.Timestamp()
out=df.loc[:,pd.Timestamp('2020-06-29 00:00:00')]
#out=df.loc[:,pd.Timestamp('2020-06-29 00:00:00')].reset_index()
The 0th level of your column is Timestamp and you can verify that by:
df.columns.to_numpy()
#output
array([(Timestamp('2020-06-29 00:00:00'), 'last_sales'),
(Timestamp('2020-06-29 00:00:00'), 'sales'),
(Timestamp('2020-06-29 00:00:00'), 'difference'),
(Timestamp('2020-07-06 00:00:00'), 'last_sales'),
(Timestamp('2020-07-06 00:00:00'), 'sales'),
(Timestamp('2020-07-06 00:00:00'), 'difference'),
(Timestamp('2021-06-28 00:00:00'), 'last_sales'),
(Timestamp('2021-06-28 00:00:00'), 'sales'),
(Timestamp('2021-06-28 00:00:00'), 'difference'),
(Timestamp('2021-07-07 00:00:00'), 'last_sales'),
(Timestamp('2021-07-07 00:00:00'), 'sales'),
(Timestamp('2021-07-07 00:00:00'), 'difference')], dtype=object)
output of out
:
last_sales sales difference
group category
A Amazon 195.0 1268.850 0.0
Apple 61.0 18274.385 0.0
Facebook 106.0 19722.650 0.0
Google 61.0 55547.255 0.0
Netflix 37.0 15323.800 0.0
Tesla 13.0 1688.675 0.0
Total 954.0 227463.230 0.0
Uber 4.0 1906.000 0.0
total 477.0 113731.615 0.0
B Amazon 50.0 3219.650 0.0
Apple 50.0 15852.060 0.0
Facebook 75.0 17743.700 0.0
Google 43.0 37795.150 0.0
Netflix 17.0 5918.500 0.0
Tesla 14.0 1708.750 0.0
Total 504.0 166349.640 0.0
Uber 3.0 937.010 0.0
total 252.0 83174.820 0.0
all Total 2916.0 787625.740 0.0
total 2916.0 787625.740 0.0
NOTE:
There is no need of providing a tuple in .loc[]
because you are selecting the 0th level
Upvotes: 4
Reputation: 11395
I’m also getting a KeyError
, but if you use a Timestamp object to index the first-level columns, it seems to work:
>>> df[pd.Timestamp('2020-06-29 00:00:00')]
last_sales sales difference
group category
A Amazon 195.0 1268.850 0.0
Apple 61.0 18274.385 0.0
Facebook 106.0 19722.650 0.0
Google 61.0 55547.255 0.0
Netflix 37.0 15323.800 0.0
Tesla 13.0 1688.675 0.0
Total 954.0 227463.230 0.0
Uber 4.0 1906.000 0.0
total 477.0 113731.615 0.0
B Amazon 50.0 3219.650 0.0
Apple 50.0 15852.060 0.0
Facebook 75.0 17743.700 0.0
Google 43.0 37795.150 0.0
Netflix 17.0 5918.500 0.0
Tesla 14.0 1708.750 0.0
Total 504.0 166349.640 0.0
Uber 3.0 937.010 0.0
total 252.0 83174.820 0.0
all Total 2916.0 787625.740 0.0
total 2916.0 787625.740 0.0
Otherwise you could use .xs
which will then also allow you more flexibility, e.g. selecting in the second level of columns and so on:
>>> df.xs(pd.Timestamp('2020-06-29 00:00:00'), axis='columns', level=0)
last_sales sales difference
group category
A Amazon 195.0 1268.850 0.0
Apple 61.0 18274.385 0.0
Facebook 106.0 19722.650 0.0
Google 61.0 55547.255 0.0
Netflix 37.0 15323.800 0.0
Tesla 13.0 1688.675 0.0
Total 954.0 227463.230 0.0
Uber 4.0 1906.000 0.0
total 477.0 113731.615 0.0
B Amazon 50.0 3219.650 0.0
Apple 50.0 15852.060 0.0
Facebook 75.0 17743.700 0.0
Google 43.0 37795.150 0.0
Netflix 17.0 5918.500 0.0
Tesla 14.0 1708.750 0.0
Total 504.0 166349.640 0.0
Uber 3.0 937.010 0.0
total 252.0 83174.820 0.0
all Total 2916.0 787625.740 0.0
total 2916.0 787625.740 0.0
You can then add .drop(index=[('all', 'total')])
to remove the second total
line, and possible .reset_index()
The way to do it with .loc[]
is to provide a tuple, with the first item being a Timestamp
object and the second an empty slice. However this will keep the 2 levels of indexing, so it is not what you want:
>>> df.loc[:, (pd.Timestamp('2020-06-29 00:00:00'), slice(None))].head(2)
2020-06-29 00:00:00
last_sales sales difference
group category
A Amazon 195.0 1268.850 0.0
Apple 61.0 18274.385 0.0
Upvotes: 4