tarkan
tarkan

Reputation: 69

Convert Column entries in to rows pandas

I have a data set as follows and it repeats the description for each state (state, num people vaccinated, percentage of population vaccinated). I want to make all of the description entries to become column headers

Currently column headers = Description, Data

Wanted column headers = State, num people vaccinated, percentage of population vaccinated, Data

Any ideas on how to do this?

Here is my current code:

import pandas as pd
import numpy as np
df = pd.read_csv("vaccinate_pct.csv",header=1,na_values=" NaN",index_col=False)
df=df[['Description','Data']]
df=df.dropna(axis=0)
df
Description Data
0   state   Vermont
1   num people vaccinated   424,323
2   percentage of population vaccinated 68
3   state   Connecticut
4   num people vaccinated   2,360,741
... ... ...
148 num people vaccinated   1,891,063
149 percentage of population vaccinated 38.57
150 state   Mississippi
151 num people vaccinated   1,144,438
152 percentage of population vaccinated 38

Upvotes: 0

Views: 51

Answers (2)

Niv Dudovitch
Niv Dudovitch

Reputation: 1668

If this is what you meant:

Your df:

enter image description here

Solution:

df_new = pd.DataFrame()
description_unique_values = df.Description.unique()
df_dict = {name: df.loc[df['Description'] == name]['Data'] for name in description_unique_values}
for v in description_unique_values:
    df_new[v] = df_dict[v].reset_index(drop=True)

Result:

enter image description here

Upvotes: 2

Bill the Lizard
Bill the Lizard

Reputation: 406095

You can do this by pivoting the dataframe, but only after you set the index so that the indices repeat for each group of 3 values. Otherwise, Pandas will fill in a bunch of NaNs.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Description': ['state', 'num people vaccinated', '% of pop vaccinated',
                                   'state', 'num people vaccinated', '% of pop vaccinated'],
                   'Data': ['Vermont', '424323', '68', 'Ohio', '6145359', '52']})
print(df)

# re-index the dataframe to [0, 0, 0, 1, 1, 1, ...]
idx_vals = list(range(0, int(len(df)/3)))
df.index = list(np.repeat(idx_vals, 3))
print(df)

df2 = df.pivot( columns='Description', values='Data')
print(df2)

Note that this is only going to work if you have exactly 3 values for each state.

Upvotes: 1

Related Questions