Software Prophets
Software Prophets

Reputation: 2976

Pandas Concat Different Sized DataFrame to End of Column

Note: Contrived example. Please don't hate on forecasting and I don't need advice on it. This is strictly a Pandas how-to question.

Example - One Solution

I have two different sized DataFrames, one representing sales and one representing a forecast.

sales = pd.DataFrame({'sales':[5,3,5,6,4,4,5,6,7,5]})
forecast = pd.DataFrame({'forecast':[5,5.5,6,5]})

The forecast needs to be with the latest sales, which is at the end of the list of sales numbers [5, 6, 7, 5]. Other times, I might want it at other locations (please don't ask why, I just need it this way).

This works:

df = pd.concat([sales, forecast], ignore_index=True, axis=1)
df.columns = ['sales', 'forecast']  # Not necessary, making next command pretty
df.forecast = df.forecast.shift(len(sales) - len(forecast))

This gives me the desired outcome:

enter image description here

Question

What I want to know is: Can I concatenate to the end of the sales data without performing the additional shift (the last command)? I'd like to do this in one step instead of two. concat or something similar is fine, but I'd like to skip the shift.

I'm not hung up on having two lines of code. That's okay. I want a solution with the maximum possible performance. My application is sensitive to every millisecond we throw at it on account of huge volumes.

Upvotes: 2

Views: 2305

Answers (1)

Cleb
Cleb

Reputation: 25997

Not sure if that is much faster but you could do

sales = pd.DataFrame({'sales':[5,3,5,6,4,4,5,6,7,5]})
forecast = pd.DataFrame({'forecast':[5,5.5,6,5]})

forecast.index = sales.index[-forecast.shape[0]:]

which gives

   forecast
6       5.0
7       5.5
8       6.0
9       5.0

and then simply

pd.concat([sales, forecast], axis=1)

yielding the desired outcome:

   sales  forecast
0      5       NaN
1      3       NaN
2      5       NaN
3      6       NaN
4      4       NaN
5      4       NaN
6      5       5.0
7      6       5.5
8      7       6.0
9      5       5.0

A one-line solution using the same idea, as mentioned by @Dark in the comments, would be:

pd.concat([sales, forecast.set_axis(sales.index[-len(forecast):], inplace=False)], axis=1)

giving the same output.

Upvotes: 4

Related Questions