Reputation: 4197
I have write down a code to append several dummy DataFrame into one. After appending, the expected "DataFrame.shape" would be (9x3). But my code producing something unexpected output (6x3). How can i rectify the error of my code.
import pandas as pd
a = [[1,2,4],[1,3,4],[2,3,4]]
b = [[1,1,1],[1,6,4],[2,9,4]]
c = [[1,3,4],[1,1,4],[2,0,4]]
d = [[1,1,4],[1,3,4],[2,0,4]]
df1 = pd.DataFrame(a,columns=["a","b","c"])
df2 = pd.DataFrame(b,columns=["a","b","c"])
df3 = pd.DataFrame(c,columns=["a","b","c"])
for df in (df1, df2, df3):
df = df.append(df, ignore_index=True)
print df
I don't want use "pd.concat" because in this case i have to store all the data frame into memory and my real data set contains hundred of data frame with huge shape. I just want a code which can open one CSV file at once into loop update the final DF with the progress of loop
thanks
Upvotes: 2
Views: 5306
Reputation: 394061
Firstly use concat
to concatenate a bunch of dfs it's quicker:
In [308]:
df = pd.concat([df1,df2,df3], ignore_index=True)
df
Out[308]:
a b c
0 1 2 4
1 1 3 4
2 2 3 4
3 1 1 1
4 1 6 4
5 2 9 4
6 1 3 4
7 1 1 4
8 2 0 4
secondly you're reusing the iterable in your loop which is why it overwrites it, if you did this it would work:
In [307]:
a = [[1,2,4],[1,3,4],[2,3,4]]
b = [[1,1,1],[1,6,4],[2,9,4]]
c = [[1,3,4],[1,1,4],[2,0,4]]
d = [[1,1,4],[1,3,4],[2,0,4]]
df1 = pd.DataFrame(a,columns=["a","b","c"])
df2 = pd.DataFrame(b,columns=["a","b","c"])
df3 = pd.DataFrame(c,columns=["a","b","c"])
df = pd.DataFrame()
for d in (df1, df2, df3):
df = df.append(d, ignore_index=True)
df
Out[307]:
a b c
0 1 2 4
1 1 3 4
2 2 3 4
3 1 1 1
4 1 6 4
5 2 9 4
6 1 3 4
7 1 1 4
8 2 0 4
Here I changed the iterable to be d
and declared an empty df
outside the loop:
df = pd.DataFrame()
for d in (df1, df2, df3):
df = df.append(d, ignore_index=True)
Upvotes: 1