ginny
ginny

Reputation: 241

Pandas repeated columns into new row

I have data in a csv which has some columns duplicated, and I want to create multiple rows which repeats some of the columns. I'm trying to do this using pandas in Python.

Take the following as an MWE. I import my data using df = pandas.read_csv('my.csv'), and have something that looks like the following:

a  b  c  c.1  c.2
1  2  3   4    5
6  7  8   9    0

As can be seen columns a and b are unique but c has multiple values. I want to split up the c values into separate rows, duplicating a and b, i.e. achieve the following:

a  b  c
1  2  3
1  2  4
1  2  5
6  7  8
6  7  9
6  7  0

In the interest of not having to type too much I've left out a d column which is similarly duplicated but the solution should be versatile enough to accommodate this.

Upvotes: 2

Views: 477

Answers (3)

BENY
BENY

Reputation: 323226

df.set_index(['a','b']).stack().reset_index().drop('level_2',1).rename(columns={0:'c'})
Out[754]: 
   a  b  c
0  1  2  3
1  1  2  4
2  1  2  5
3  6  7  8
4  6  7  9
5  6  7  0

Upvotes: 1

andrew_reece
andrew_reece

Reputation: 21264

Use melt:

pd.melt(df, id_vars=['a','b'], value_name='c').drop('variable',1)

Output:

   a  b  c
0  1  2  3
1  6  7  8
2  1  2  4
3  6  7  9
4  1  2  5
5  6  7  0

It's a little tricky to build a general case for multiple variables. Here's a solution that's based on your case of having dupes for both c and d. It still uses melt(), but then renames all the c variants to c and same for d. Then it basically separates the c and d segments and merges them. It's a little clunky.

df['d'] = df.c
df['d.1'] = df['c.1']
df['d.2'] = df['c.2']

df2 = pd.melt(df, id_vars=['a','b'])
df2.variable = df2.variable.str.replace('^(c|d).*', '\\1')
dupe_vars = ['c','d']

(df2.loc[df2.variable.eq('c')]
    .rename(columns={'value':'c'})
    .drop('variable',1)
    .reset_index(drop=True).merge(
        df2.loc[df2.variable.eq('d')]
           .rename(columns={'value':'d'})
           .drop('variable',1)
           .reset_index(drop=True), 
        left_index=True, right_index=True, on=['a','b']))

   a  b  c  d
0  1  2  3  3
1  6  7  8  8
2  1  2  4  4
3  6  7  9  9
4  1  2  5  5
5  6  7  0  0

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

In [426]: pd.lreshape(df, {'c':df.columns[df.columns.str.match('^c\.?\d?')]})
Out[426]:
   a  b  c
0  1  2  3
1  6  7  8
2  1  2  4
3  6  7  9
4  1  2  5
5  6  7  0

Upvotes: 3

Related Questions