Reputation: 1367
How can I concat a Series of shape (4,) to a df of shape (1,4) and obtain a df of shape (2,4) without converting the Series to a df first? I am trying to insert a Series as the top row of a df.
For example:
import pandas as pd
mydict1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}]
mydict2 = [{'a': 5, 'b': 6, 'c': 7, 'd': 8}]
# 1x4 dataframes
df1 = pd.DataFrame(mydict1)
df2 = pd.DataFrame(mydict2)
# series1.shape: (4,)
series1 = df1.iloc[0]
# df3.shape: (1,4)
df3 = df1.iloc[[0]]
# 5x5 df. With a new row and column representing the indexes of each. If anything, I'd expect a 4x4 df here, not a 5x5.
dfDfSeriesAxis0 = pd.concat([df2, df1.iloc[0]], axis=0)
# 5x5 df. I would think this is different from the above method with axis=0, but it appears to be identical
dfDfSeriesAxis1 = pd.concat([df2, df1.iloc[0]], axis=1)
# 5x5 df
dfSeriesDfAxis0 = pd.concat([df1.iloc[0], df2])
# 5x5 df
dfSeriesDfAxis1 = pd.concat([df1.iloc[0], df2], axis=1)
# This achieves the result I want (2x4 df) but must convert to a df before concat.
dfDf1Df2Axis0 = pd.concat([df1.iloc[[0]], df2])
# Concats to a 2x4 df, but in the wrong order
dfDf2Df1Axis0 = pd.concat([df2,df1.iloc[[0]]])
# Concats along incorrect axis and I end up with a 1x8 df
dfDf1Df2Axis1 = pd.concat([df1.iloc[[0]], df2], axis=1)
# Appends along correct axis and I end up with a 2x4 df. Why does appending work as expected but concat does not?
dfAppendSeries = df1.append(df2.iloc[-1])
# Appends along correct axis and I end up with a 2x4 df
dfAppendDf = df1.append(df2)
It appears iloc[0] returns a Series while iloc[[0]] returns a dataframe. Furthermore iloc[0:1] appears to return the same dataframe that iloc[[0]] returns.
My main source of confusion is why dfAppendSeries = df1.append(df2.iloc[-1])
results in the expected 2x4 df, whereas dfDfSeriesAxis0 = pd.concat([df2, df1.iloc[0]], axis=0)
results in a 5x5 df. I really can't image how the resulting df from dfDfSeriesAxis0 = pd.concat([df2, df1.iloc[0]], axis=0)
would be useful under any circumstance.
Is there a way to make the returned object from df1.iloc[0]
compatible to concat with df2
without making it a dataframe itself? In that I mean making it the appropriate shape to concat with a (1,4) df to result in a df of shape (2,4)? I tried transposing series1, but this appears to have no affect on the shape.
Although not explicitly stated in this context, according to the docs I would expect to be able to do this:
Returns
object, type of objs
When concatenating all Series along the index (axis=0), a Series is returned. When objs contains at least one DataFrame, a DataFrame is returned. When concatenating along the columns (axis=1), a DataFrame is returned.
Upvotes: 0
Views: 948
Reputation: 41387
As discussed in the comments, you can concat
using a double transpose:
pd.concat([series1, df2.T], axis=1).T.reset_index(drop=True)
# a b c d
# 0 1 2 3 4
# 1 5 6 7 8
However note that it's much faster to prepend the Series as a list insertion:
%%timeit
data = df2.values.tolist()
data.insert(0, series1.tolist())
pd.DataFrame(data, columns=df2.columns)
# 367 µs ± 37.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
As opposed to dataframe expansion, especially if you're planning to prepend often:
%%timeit
pd.concat([series1, df2.T], axis=1).T.reset_index(drop=True)
# 1.36 ms ± 44.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 1
Reputation: 1348
The reason is that when .iloc
is given a single integer it returns a series as it does with df.iloc[0]
, but when it is given a list it returns a data frame. When you add the extra brackets it treats the single integer zero as a list of integers containing only zero. The pd.concat()
function returns a series if it is passed two series, but if it is passed a data frame at all it will always return a data frame. In the case of df1.iloc[0]
this puts pandas in the position of making a data frame from a series. It uses the letter column as the row label and enters the series data vertically with a default column name of 0. When the data frame is added to the converted series it keep its row names, which do not match those of the converted series, and it is filled post column zero because df2
has no column zero.
EDIT:
This code should get you a (2,4) Data Frame with minimal effort:
import pandas as pd
mydict1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}]
mydict2 = [{'a': 5, 'b': 6, 'c': 7, 'd': 8}]
df1 = pd.DataFrame(mydict1)
df2 = pd.DataFrame(mydict2)
DF1=df1.melt().set_index('variable')
DF2=df2.melt().set_index('variable')
DF1.insert(1,'col_name',DF2['value'],True)
DF1 #4x2 data frame#
answer= DF1.T #2x4 data frame#
answer
I hope this helps.
Upvotes: 1