Uvar
Uvar

Reputation: 3462

Is there an elegant solution to concatenating Dataframes as fixed element in a list?

Arguably I can improve on function design choices in the first place, but currently I am in a situation where a function returns a tuple, or list, of dataframes pertaining to different data streams. The idea is that each data stream separately needs to be concatenated at the end. For now limited to three, but scalable.

import pandas as pd

def spit_out_three_dfs():
    df1 = pd.DataFrame({"Timestamp": ["x", "y"], "Data": [1,1]})
    df2 = pd.DataFrame({"Timestamp": ["x", "y"], "Data": [2,2]})
    df3 = pd.DataFrame({"Timestamp": ["x", "y"], "Data": [3,3]})
    return df1, df2, df3

df1_concat, df2_concat, df3_concat = map(lambda x: pd.concat(x), [spit_out_three_dfs() for i in range(3)])

will yield all 3 dataframes to return as

df1_concat
Out[14]: 
  Timestamp  Data
0         x     1
1         y     1
0         x     2
1         y     2
0         x     3
1         y     3

instead of the wished for:

df1_concat
Out[14]: 
  Timestamp  Data
0         x     1
1         y     1
0         x     1
1         y     1
0         x     1
1         y     1

and similarly data stream 2 in df2_concat, etc. What I am explicitly looking for is a way to concatenate the three streams in one go.

What I can do, is to merge the streams and later filter/query: this runs into memory problems.

Another effort I can make, is to run a nested list comprehension to pick the xth element every time, which will exhaust the generator, requiring me to run the entire operation again for the next element. For 3 data streams this is manageable, but horribly inefficient still.

Is there a pandas wizard out there who can point me in the right direction?

Upvotes: 0

Views: 20

Answers (1)

David
David

Reputation: 463

If I had understood it well, you want to concatenate data coming from different streams, that are initially stored in a structure such as:

[ (df_stm0_0, df_stm1_0, ...), (df_stm0_1, df_stm1_1, ...), ...]

If that's the case. I believe you're applying the concatenation function at the wrong level and you should use a the zip iterator to make lists like:

[[df_stm0_0, df_stm0_1, ...], [df_stm1_0, df_stm1_1, ...], ... ]

Your return line should look like this:

df1_concat, df2_concat, df3_concat = map(lambda x: pd.concat, zip(*[spit_out_three_dfs() for i in range(3)]))

Upvotes: 1

Related Questions