Greg H.
Greg H.

Reputation: 13

How do I transpose many sub headers in a Pandas data frame that were read in as rows to columns in a dynamic dataset

I am reading in a huge dataset from CSVs. The company is is keeping a living ledger in an Excel Workbook with about 70 tabs, so my problem stems from dynamic files where row locations change from tab to tab. Each tab has the same groups as sub headers, but rows are inserted as needed when they record a new line item. So a sub header in cell A15 on one tab might be A36 in another.

I'd like to transpose the group's sub headers to new columns assigned to the appropriate row keeping the integrity of the binned data.

As I can't share the actual CSVs I saved from the Workbook I made an example DF.

After reading in from Pandas the sub headers are Fruit:, Vegetable:, Dairy:, and Meat:.

|Name| |Qty| |Price|
|Fruit:| |0| |0|
|Apple| |20| |1.01|
|Orange| |25| |1.65|
|Vegetable:| |0| |0|
|Carrot| |20| |1.01|
|Dairy:| |0| |0|
|Meat:| |0| |0|
|Pork| |100| |4.00|
|Ground Chuck| |50| |2.00|

The desired transformation would look like this:

|Group||Name| |Qty| |Price|

|Fruit:| |Apple| |20| |1.01|

|Fruit:||Orange |25| |1.65|

|Vegetable:||Carrot| |20| |1.01|

|Dairy:| |0| |0| |0|

|Meat:||Pork| |100| |4.00|

|Meat:||Ground Chuck| |50| |2.00|

I included the colons on the sub headers because my dataset has them as well. I have an inkling they can be used when looping through and creating new columns with the new Group.

CSV's will be batched in at regular intervals, so mapping the DF's locations is a huge burden, I'd like to automate the process.

Upvotes: 1

Views: 65

Answers (1)

Amira Bedhiafi
Amira Bedhiafi

Reputation: 1

You need to identify the sub headers and then use them to create a new group column in your df and since your headers have colons, you can use this characteristic to make the distinction from the other data rows.

import pandas as pd
from io import StringIO

data = """
Name,Qty,Price
Fruit:,0,0
Apple,20,1.01
Orange,25,1.65
Vegetable:,0,0
Carrot,20,1.01
Dairy:,0,0
Meat:,0,0
Pork,100,4.00
Ground Chuck,50,2.00
"""

df = pd.read_csv(StringIO(data))

transformed_data = []

# here I try to track of the current group
current_group = None

# then I iterate through each row in the df
for index, row in df.iterrows():
    # I verify if the 1st column ends with a colon
    if str(row['Name']).endswith(':'):
        current_group = row['Name']
    else:
        transformed_data.append([current_group] + row.tolist())

transformed_df = pd.DataFrame(transformed_data, columns=['Group', 'Name', 'Qty', 'Price'])

print(transformed_df.head(10))  

enter image description here

Upvotes: 0

Related Questions