Christopher James
Christopher James

Reputation: 332

pandas dataframe row manipulation

I'm sure that I'm missing something simple, but I haven't be able to figure this one out. I have a DataFrame in Pandas with multiple rows that have the same keys, but different information. I want to place these rows onto the same row.

df = pd.DataFrame({'key': ['K0', 'K0', 'K1', 'K2'],
                  'A': ['A0', 'A1', 'A2', 'A3'],
                  'B': ['B0', 'B1', 'B2', 'B3']})

This will give me a dataframe with 4 rows and 3 columns. But there is a duplicate value 'KO' in 'key'

Is there any way to turn this into a dataframe with 3 rows, and 5 columns like shown below?

df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
                  'A': ['A0', 'A2', 'A3'],
                  'B': ['B0', 'B2', 'B3'],
                  'A_1': ['A1', 'NaN', 'NaN'],
                  'B_1': ['B1', 'NaN', 'NaN']})

Upvotes: 3

Views: 309

Answers (3)

B. M.
B. M.

Reputation: 18628

I think this alter the layout. just put key as an index to access fields :

df2 = df.set_index([df.key,df.index])

Then

In [248]: df2.loc['K1']
Out[248]: 
    A   B key
2  A2  B2  K1

In [249]: df2.loc['K0']
Out[249]: 
    A   B key
0  A0  B0  K0
1  A1  B1  K0

and iter on rows.

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153460

Let's use set_index, groupby, cumcount, and unstack, then flatten multiindex with map and format:

df_out = df.set_index(['key', df.groupby('key').cumcount()]).unstack()
df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)
df_out.reset_index()

Output:

  key A_0   A_1 B_0   B_1
0  K0  A0    A1  B0    B1
1  K1  A2  None  B2  None
2  K2  A3  None  B3  None

Upvotes: 0

cs95
cs95

Reputation: 402263

Perform groupby on cumcount, then concatenate individual groups together.

gps = []
for i, g in df.groupby(df.groupby('key').cumcount()):
    gps.append(g.drop('key', 1).add_suffix(i + 1).reset_index(drop=1))

r = pd.concat(gps, 1).sort_index(axis=1)
r['key'] = df.key.unique()

r
   A1   A2  B1   B2 key
0  A0   A1  B0   B1  K0
1  A2  NaN  B2  NaN  K1
2  A3  NaN  B3  NaN  K2

You can shorten this somewhat using a list comprehension -

r = pd.concat(
         [g.drop('key', 1).add_suffix(i + 1).reset_index(drop=1) 
                    for i, g in df.groupby(df.groupby('key').cumcount())], 
         axis=1)\
      .sort_index(axis=1)

r['key'] = df.key.unique()
r
   A1   A2  B1   B2 key
0  A0   A1  B0   B1  K0
1  A2  NaN  B2  NaN  K1
2  A3  NaN  B3  NaN  K2

Upvotes: 1

Related Questions