codeninja
codeninja

Reputation: 379

Repeating elements in a dataframe

Hi all I have the following dataframe:

A | B | C
1   2   3 
2   3   4 
3   4   5
4   5   6

And I am trying to only repeat the last two rows of the data so that it looks like this:

A | B | C
1   2   3 
2   3   4 
3   4   5
3   4   5
4   5   6
4   5   6

I have tried using append, concat and repeat to no avail.

repeated = lambda x:x.repeat(2)
df.append(df[-2:].apply(repeated),ignore_index=True)

This returns the following dataframe, which is incorrect:

A | B | C
1   2   3 
2   3   4 
3   4   5
4   5   6
3   4   5
3   4   5
4   5   6
4   5   6

Upvotes: 3

Views: 329

Answers (3)

piRSquared
piRSquared

Reputation: 294218

I'm partial to manipulating the index into the pattern we are aiming for then asking the dataframe to take the new form.

Option 1
Use pd.DataFrame.reindex

df.reindex(df.index[:-2].append(df.index[-2:].repeat(2)))

   A  B  C
0  1  2  3
1  2  3  4
2  3  4  5
2  3  4  5
3  4  5  6
3  4  5  6

Same thing in multiple lines

i = df.index
idx = i[:-2].append(i[-2:].repeat(2))
df.reindex(idx)

Could also use loc

i = df.index
idx = i[:-2].append(i[-2:].repeat(2))
df.loc[idx]

Option 2
Reconstruct from values. Only do this is all dtypes are the same.

i = np.arange(len(df))
idx = np.append(i[:-2], i[-2:].repeat(2))
pd.DataFrame(df.values[idx], df.index[idx])

   0  1  2
0  1  2  3
1  2  3  4
2  3  4  5
2  3  4  5
3  4  5  6
3  4  5  6

Option 3
Can also use np.array in iloc

i = np.arange(len(df))
idx = np.append(i[:-2], i[-2:].repeat(2))
df.iloc[idx]

   A  B  C
0  1  2  3
1  2  3  4
2  3  4  5
2  3  4  5
3  4  5  6
3  4  5  6

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153460

Use pd.concat and index slicing with .iloc:

pd.concat([df,df.iloc[-2:]]).sort_values(by='A')

Output:

   A  B  C
0  1  2  3
1  2  3  4
2  3  4  5
2  3  4  5
3  4  5  6
3  4  5  6

Upvotes: 2

jezrael
jezrael

Reputation: 862511

You can use numpy.repeat for repeating index and then create df1 by loc, last append to original, but before filter out last 2 rows by iloc:

df1 = df.loc[np.repeat(df.index[-2:].values, 2)]
print (df1)
   A  B  C
2  3  4  5
2  3  4  5
3  4  5  6
3  4  5  6

print (df.iloc[:-2])
   A  B  C
0  1  2  3
1  2  3  4

df = df.iloc[:-2].append(df1,ignore_index=True)
print (df)
   A  B  C
0  1  2  3
1  2  3  4
2  3  4  5
3  3  4  5
4  4  5  6
5  4  5  6

If want use your code add iloc for filtering only last 2 rows:

repeated = lambda x:x.repeat(2)
df = df.iloc[:-2].append(df.iloc[-2:].apply(repeated),ignore_index=True)
print (df)
   A  B  C
0  1  2  3
1  2  3  4
2  3  4  5
3  3  4  5
4  4  5  6
5  4  5  6

Upvotes: 2

Related Questions