Reputation: 2365
I have data frame series which looks like this:
a b r
1 43 630 587
d b c
1 34 30 87
I want to create a new dataframe which looks like:
a b r d c
43 630 587 0 0
0 30 0 34 87
I have used the code:
appended_data= pd.concat(appended_data, axis=0)
where the list appended_data contains the individual dataframe series as elements.
Earlier when I used it with other dataset it didnt throw any error but with the new dataset its showing ValueError: Plan shapes are not aligned
Note: Earlier dataset also had duplicate columns and it worked fine then and I also updated pandas.These were the solutions I found online.
full code:
dir_list = [benign_freq_dir,malign_freq_dir]
appended_data = []
for l in dir_list:
for root, dirs, files in os.walk(l):
#print(root)
for name in files:
file = open(root+"/"+name,'r')
print(name)
print("\n")
df = pd.read_csv(file,header=None,error_bad_lines=False) #In windows and python3 always pass file object not the path directly in pd.read_csv
#print(df)
df = df.rename(columns={0: 'col'})
#print(df)
df = pd.DataFrame(df.col.str.split(' ',1).tolist(), columns = ['col1','col2']).T.reset_index(drop=True)
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
print(df)
appended_data.append(df)
if l==benign_freq_dir:
df['class']=0
else:
df['class']=1
#for l in appended_data:
# print(l)
# print(type(l))
appended_data= pd.concat(appended_data, axis=0,sort=False)
edit:
output for:
for dfx in appended_data:
print(dfx.head(2).to_dict())
Upvotes: 0
Views: 11753
Reputation: 6179
You will need an outer join for that.
import pandas as pd
df1 = pd.DataFrame({
'a': [43],
'b': [630],
'r': [587]
})
df2 = pd.DataFrame({
'd': [34],
'b': [30],
'c': [87]
})
df3 = df1.merge(df2, how='outer').fillna(0)
print(df3)
Yields what you need.
a b r d c
0 43.0 630 587.0 0.0 0.0
1 0.0 30 0.0 34.0 87.0
Docs on pd.merge
Docs on outer join
EDIT:
OP, pd.concat
should work as expected and Anton has proven that.
Since pd.merge
was my answer, I have to stick with that.
Some pseudocode if you want to merge a list of dataframes.
def merge(lst, df=None):
if df is None:
df = lst.pop()
to_be_merged = lst.pop()
merged = df.merge(to_be_merged, how='outer')
if lst:
return merge(lst, merged)
return merged.fillna(0)
df = merge(list_of_dfs)
That way you will know instantly which df is at fault because clearly there's a problem with your data. Catch the exception and use .describe()
and .info()
to debug this issue.
Upvotes: 3
Reputation: 18906
You can use pd.concat
. You should however pass both dataframes.
pd.concat([df1,df2], axis=0, sort=False).fillna(0) #.astype(int) for ints
# a b r d c
#0 43.0 630 587.0 0.0 0.0
#0 0.0 30 0.0 34.0 87.0
Sample data from Tom Wojcik.
Upvotes: 2