Reputation: 241
I have data in a csv which has some columns duplicated, and I want to create multiple rows which repeats some of the columns. I'm trying to do this using pandas in Python.
Take the following as an MWE. I import my data using df = pandas.read_csv('my.csv')
, and have something that looks like the following:
a b c c.1 c.2
1 2 3 4 5
6 7 8 9 0
As can be seen columns a and b are unique but c has multiple values. I want to split up the c values into separate rows, duplicating a and b, i.e. achieve the following:
a b c
1 2 3
1 2 4
1 2 5
6 7 8
6 7 9
6 7 0
In the interest of not having to type too much I've left out a d column which is similarly duplicated but the solution should be versatile enough to accommodate this.
Upvotes: 2
Views: 477
Reputation: 323226
df.set_index(['a','b']).stack().reset_index().drop('level_2',1).rename(columns={0:'c'})
Out[754]:
a b c
0 1 2 3
1 1 2 4
2 1 2 5
3 6 7 8
4 6 7 9
5 6 7 0
Upvotes: 1
Reputation: 21264
Use melt
:
pd.melt(df, id_vars=['a','b'], value_name='c').drop('variable',1)
Output:
a b c
0 1 2 3
1 6 7 8
2 1 2 4
3 6 7 9
4 1 2 5
5 6 7 0
It's a little tricky to build a general case for multiple variables. Here's a solution that's based on your case of having dupes for both c
and d
. It still uses melt()
, but then renames all the c
variants to c
and same for d
. Then it basically separates the c
and d
segments and merges them. It's a little clunky.
df['d'] = df.c
df['d.1'] = df['c.1']
df['d.2'] = df['c.2']
df2 = pd.melt(df, id_vars=['a','b'])
df2.variable = df2.variable.str.replace('^(c|d).*', '\\1')
dupe_vars = ['c','d']
(df2.loc[df2.variable.eq('c')]
.rename(columns={'value':'c'})
.drop('variable',1)
.reset_index(drop=True).merge(
df2.loc[df2.variable.eq('d')]
.rename(columns={'value':'d'})
.drop('variable',1)
.reset_index(drop=True),
left_index=True, right_index=True, on=['a','b']))
a b c d
0 1 2 3 3
1 6 7 8 8
2 1 2 4 4
3 6 7 9 9
4 1 2 5 5
5 6 7 0 0
Upvotes: 2
Reputation: 210832
In [426]: pd.lreshape(df, {'c':df.columns[df.columns.str.match('^c\.?\d?')]})
Out[426]:
a b c
0 1 2 3
1 6 7 8
2 1 2 4
3 6 7 9
4 1 2 5
5 6 7 0
Upvotes: 3