Javier Lopez Tomas
Javier Lopez Tomas

Reputation: 2342

Concat multiples dataframes within a list

I have several dataframes in a list, obtained after using np.array_split and I want to concat some of then into a single dataframe. In this example, I want to concat 3 dataframes contained in b (all but the 2nd one, which is the element b[1] in the list):

df = pd.DataFrame({'country':['a','b','c','d'],
  'gdp':[1,2,3,4],
  'iso':['x','y','z','w']})

a = np.array_split(df,4)
i = 1
b = a[:i]+a[i+1:]

desired_final_df = pd.DataFrame({'country':['a','c','d'],
  'gdp':[1,3,4],
  'iso':['x','z','w']})

I have tried to create an empty df and then use append through a loop for the elements in b but with no complete success:

CV = pd.DataFrame()
CV = [CV.append[(b[i])] for i in b] #try1
CV = [CV.append(b[i]) for i in b] #try2
CV = pd.DataFrame([CV.append[(b[i])] for i in b]) #try3

for i in b:
 CV.append(b) #try4

I have reached to a solution which works but it is not efficient:

CV = pd.DataFrame()
CV = [CV.append(b) for i in b][0]

In this case, I get in CV three times the same dataframe with all the rows and I just get the first of them. However, in my real case, in which I have big datasets, having three times the same would result in much more time of computation.

How could I do that without repeating operations?

Upvotes: 0

Views: 85

Answers (2)

sentence
sentence

Reputation: 8903

To cancatenate multiple DFs, resetting index, use pandas.concat:

pd.concat(b, ignore_index=True)

output

    country gdp iso
0   a   1   x
1   c   3   z
2   d   4   w

Upvotes: 1

jfaccioni
jfaccioni

Reputation: 7509

According to the docs, DataFrame.append does not work in-place, like lists. The resulting DataFrame object is returned instead. Catching that object should be enough for what you need:

df = pd.DataFrame()
for next_df in list_of_dfs:
    df = df.append(next_df)

You may want to use the keyword argument ignore_index=True in the append call so that the indices become continuous, instead of starting from 0 for each appended DataFrame (assuming that the index of the DataFrames in the list all start from 0).

Upvotes: 2

Related Questions