Reputation: 49
I have limited my requirements to 5 columns and 3 rows for easy explanation. My column header will come to string and my rows will come to a string. I want all the rows to be added to a dataframe. Here is what I have tried
import pandas as pd
Column_Header = "Col1,Col2,Col3,Col4,Col5" # We have upto 500 columns
df = pd.DataFrame(columns=Column_Header.split(","))
#we will get upto 100000 rows from a server response
Row1 = "Val11,Val12,Val13,Val14,Val15"
Row2 = "Val21,Val22,Val23,Val124,Val25"
Row3 = "Val31,Val32,Val33,Val34,Val35"
df_temp = pd.DataFrame(data = Row1.split(",") , columns = Column_Header.split(","))
pd.concat(df,df_temp)
print(pd)
Upvotes: 0
Views: 2340
Reputation: 402844
If this is a viable option, it would be simpler to leave all the data munging to pd.read_csv
. Convert all your strings to a single multiline string, and pass it through a StringIO
buffer to read_csv
.
import io
data = '\n'.join([Column_Header, Row1, Row2, Row3])
df = pd.read_csv(io.StringIO(data))
df
Col1 Col2 Col3 Col4 Col5
0 Val11 Val12 Val13 Val14 Val15
1 Val21 Val22 Val23 Val124 Val25
2 Val31 Val32 Val33 Val34 Val35
If you're on python2.x, the io
module is available as the cStringIO
module, so you'd have to import it as:
import cStringIO as io
Upvotes: 1
Reputation: 863216
The best and fastest is create list of all data by list comprehension
and call DataFrame
constructor only once:
Column_Header = "Col1,Col2,Col3,Col4,Col5"
Row1 = "Val11,Val12,Val13,Val14,Val15"
Row2 = "Val21,Val22,Val23,Val124,Val25"
Row3 = "Val31,Val32,Val33,Val34,Val35"
rows = [Row1,Row2,Row3]
L = [x.split(',') for x in rows]
print (L)
[['Val11', 'Val12', 'Val13', 'Val14', 'Val15'],
['Val21', 'Val22', 'Val23', 'Val124', 'Val25'],
['Val31', 'Val32', 'Val33', 'Val34', 'Val35']]
df = pd.DataFrame(data = L , columns = Column_Header.split(","))
print (df)
Col1 Col2 Col3 Col4 Col5
0 Val11 Val12 Val13 Val14 Val15
1 Val21 Val22 Val23 Val124 Val25
2 Val31 Val32 Val33 Val34 Val35
Upvotes: 2