Brian Postow
Brian Postow

Reputation: 12217

Duplicating a single row in a dataframe

I want to write a function:

def dupRow(df, val):

which takes a DataFrame, and a value. it finds the row in the 'val' column and duplicates just that row. So, for example if df=

 val  data1  data2  data3
0   a      3      1      9
1   b     89      2      8
2   c      7      3      7
3   d      0      4      6

then dupRow(df, 'c') returns:

  val  data1  data2  data3
0   a      3      1      9
1   b     89      2      8
2   c      7      3      7
3   c      7      3      7
4   d      0      4      6

It can put the duplicated row at the bottom, I can reorder rows when I'm done, it's just easier to see this way.

I've seen a bunch of things using np.repeat, but I can't figure out how to get it to that only once rather than on the entire index...

Upvotes: 1

Views: 1211

Answers (5)

Brian Postow
Brian Postow

Reputation: 12217

I've accepted another answer but because my case was SLIGHTLY different, I figured I'd post my actual solution here:

In the actual case, I have a large DF, and rather than wanting to duplicate all of the rows with val=='c', I want to duplicate ONE row of every set that have a duplicated val... so if there are 3 rows with val=='c', I duplicate the first one... what I ended up doing is:

data['nonDup'] = 1
counts = data.groupby(indexCol).count()['nonDup']
dups = [c for c in counts.index if counts[c] > 1]

for d in dups:
    dupData = data[data[indexCol]==d]
    dupInds = dupData.index.tolist()
    data = data.append(data.loc[dupInds[0]], ignore_index=True)
    
    data.loc[dupInds, 'nonDup'] = 0

I think that "ignore_index=True" is easier than going through the reindexing...

Upvotes: 0

Corralien
Corralien

Reputation: 120559

You can use Index.repeat:

def dupRow(df, val):
    return df.reindex(df.index.repeat(df['val'].eq('c').astype(int).add(1)))

>>> dupRow(df, 'c')
  val  data1  data2  data3
0   a      3      1      9
1   b     89      2      8
2   c      7      3      7
2   c      7      3      7
3   d      0      4      6

Just for information, it's probably not significative:

def dupRow_zabop(df, val):
    return df[df.val!=val].append([df[df.val==val]]*2)

def dupRow_not_speshal(df, val):
    return df.append(df[df["val"].eq(val)]).sort_values("val").reset_index(drop=True)

def dupRow_beny(df, val):
    return df.reindex(df.index.append(df.index[df.val==val])).sort_index()

def dupRow_corralien(df, val):
    return df.reindex(df.index.repeat(df['val'].eq('c').astype(int).add(1)))
%timeit dupRow_zabop(df, 'c')
744 µs ± 3.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit dupRow_not_speshal(df, 'c')
714 µs ± 3.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit dupRow_beny(df, 'c')
393 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit dupRow_corralien(df, 'c')
345 µs ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

append is a more expensive operation compared to reindex.

Upvotes: 2

BENY
BENY

Reputation: 323396

Usually we do reindex

row = 'c'
out = df.reindex(df.index.append(df.index[df.val==row])).sort_index()
Out[27]: 
  val  data1  data2  data3
0   a      3      1      9
1   b     89      2      8
2   c      7      3      7
2   c      7      3      7
3   d      0      4      6

Upvotes: 3

zabop
zabop

Reputation: 7932

You can do:

def dupRow(df, val):
    return df[df.val!=val].append([df[df.val==val]]*2)

Example:

data = {'val': [1,2,3,4],
        'col2': ['a','b','c','d']}
df = pd.DataFrame(data)

df is:

   val col2
0    1    a
1    2    b
2    3    c
3    4    d

dupRow(df,3) returns:

   val col2
0    1    a
1    2    b
3    4    d
2    3    c
2    3    c

Upvotes: 1

not_speshal
not_speshal

Reputation: 23166

IIUC, try:

def dupRow(df, val):
    return df.append(df[df["val"].eq(val)]).sort_values("val").reset_index(drop=True)

>>> dupRow(df, 'c')
  val  data1  data2  data3
0   a      3      1      9
1   b     89      2      8
2   c      7      3      7
3   c      7      3      7
4   d      0      4      6

Upvotes: 3

Related Questions