Reputation: 1075
I have two data frames:
import pandas as pd
import numpy as np
sgRNA = pd.Series(["ABL1_sgABL1_130854834","ABL1_sgABL1_130862824","ABL1_sgABL1_130872883","ABL1_sgABL1_130884018"])
sequence = pd.Series(["CTTAGGCTATAATCACAATG","GGTTCATCATCATTCAACGG","TCAGTGATGATATAGAACGG","TTGCTCCCTCGAAAAGAGCG"])
df1=pd.DataFrame(sgRNA,columns=["sgRNA"])
df1["sequence"]=sequence
df2=pd.DataFrame(columns=["column"],
index=np.arange(len(df1) * 2))
I want to add values from both columns from df1 to df2 every other row, like this:
ABL1_sgABL1_130854834
CTTAGGCTATAATCACAATG
ABL1_sgABL1_130862824
GGTTCATCATCATTCAACGG
ABL1_sgABL1_130872883
TCAGTGATGATATAGAACGG
ABL1_sgABL1_130884018
TTGCTCCCTCGAAAAGAGCG
To do this for df1["sgRNA"]
I used this code:
df2.iloc[0::2, :]=df1["sgRNA"]
But I get this error:
ValueError: could not broadcast input array from shape (4,) into shape (4,1)
.
What am I doing wrong?
Upvotes: 2
Views: 304
Reputation: 1888
Besides Andrej Kesely's superior solution, to answer the question of what went wrong in the code, it's really minor:
df1["sgRNA"]
is a series, one-dimensional, while df2.iloc[0::2, :]
is
a dataframe, two-dimensional.
The solution would be to make the "df2" part one-dimensional by selecting the one and only column, instead of selecting a slice of "all one columns", so to say:
df2.iloc[0::2, 0] = df1["sgRNA"]
Upvotes: 2
Reputation: 195573
I think you're looking for DataFrame.stack()
:
df2["column"] = df1.stack().reset_index(drop=True)
print(df2)
Prints:
column
0 ABL1_sgABL1_130854834
1 CTTAGGCTATAATCACAATG
2 ABL1_sgABL1_130862824
3 GGTTCATCATCATTCAACGG
4 ABL1_sgABL1_130872883
5 TCAGTGATGATATAGAACGG
6 ABL1_sgABL1_130884018
7 TTGCTCCCTCGAAAAGAGCG
Upvotes: 4