Alhpa Delta
Alhpa Delta

Reputation: 3660

Reshape a long column csv file using pandas to get a proper dataframe table

I have data in a csv file in a single column which I want to convert into a table with column headers. The input file is of the type:

df1 = pd.DataFrame(['CompA','$200','$450','10.3x','50.0%'
        ,'CompB','$300','$50','13.2x','40.0%',
        'CompC','$100','$150','2.8x','13.5%',
        'CompD','$150','$250','3.8x','53.2%'
        ])

I want to convert this into a table dataframe with the headers

column_names = ['Company name','Revenues','Gross Profit','P/E Multiple','Operating Margin']

So, that the various companies (in the example above it is 4 companies CompA, CompB, CompC and CompD, each have its data in its own row

I tried the following, but it is highly inelegant , not to mention, it involves manual counting of the data, and this still just adds the 'header column' data but still does not make a table:

arr1 = column_names*4
df1[1] = arr1

And then when I tried to pivot it, it was not putting the Revenues, and Gross Profit etc in one row, but creating a separate row for each. This is what I did:

df2 = df1.pivot(columns=1,values=0)

How do I fix this?

Upvotes: 1

Views: 336

Answers (2)

skuzzy
skuzzy

Reputation: 541

You are almost correct. Pivot can work this way, however, it requires three things, the values to be pivoted, the column to pivot on, and an index.

I don't see the need to count manually here.

# Get number of entities in long list
n_entities = int(len(df)/len(column_names))

# Generates n-repetitions of column_names and assign to df for pivot
df['col_name'] = column_names * n_entities 

# Generate and assign an index column
index_vals = []
for i in range(n_entities):
    index_vals.extend([str(i)]*len(column_names))
df['index_val'] = index_vals 

df.pivot(index = 'index_val', columns='col_name', values=0)

Upvotes: 1

Michael Szczesny
Michael Szczesny

Reputation: 5036

You can reshape the values in your dataframe using the column_names

pd.DataFrame(df1.to_numpy().reshape(-1, len(column_names)), columns=column_names)

Out:

  Company name Revenues Gross Profit P/E Multiple Operating Margin
0        CompA     $200         $450        10.3x            50.0%
1        CompB     $300          $50        13.2x            40.0%
2        CompC     $100         $150         2.8x            13.5%
3        CompD     $150         $250         3.8x            53.2%

Upvotes: 2

Related Questions