Reputation: 320
How do I create a dataframe from a string that look like this (part of the string)
,file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,
Trying to make a dataframe that look like this
Upvotes: 1
Views: 550
Reputation: 7863
In the code below s
is the string:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(s)).dropna(axis=1)
df.rename(columns={df.columns[0]: ""}, inplace=True)
By the way, if the string comes from a csv file then it is simpler to read the file directly using pd.read_csv
.
Edit: This code will create a multiindex of columns:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(s), header = None).dropna(how="all", axis=1).T
df[0] = df.loc[1, 0]
df = df.set_index([0, 1]).T
Upvotes: 2
Reputation: 8508
Looks like you want a multi-level dataframe from the string. Here's how I would do it.
'\r\n'
. Then for each value, split by
','
x data
and y data
). To ensure you have key:[list of values], use the
dict.setdefault(key,[]).append(value)
. This will ensure the data
is created as a `key:[list of values]' dictionary.Putting all this together, the code is:
import pandas as pd
text = ',file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,'
line_text = [txt.split(',') for txt in text.split('\r\n')]
dct = {}
for x,y,z in line_text[2:]:
dct.setdefault(line_text[1][0], []).append(x)
dct.setdefault(line_text[1][1], []).append(y)
df = pd.DataFrame(dct)
df.columns = pd.MultiIndex.from_tuples([(line_text[0][i],line_text[1][i]) for i in [0,1]])
print (df)
Output of this will be:
file_05
x data y data
0 -970.0 -34.12164
1 -959.0 -32.37526
2 -949.0 -30.360199
3 -938.0 -28.74816
4 -929.0 -27.53912
5 -920.0 -25.92707
6 -911.0 -24.31503
7 -900.0 -23.64334
8 -891.0 -22.29997
Upvotes: 1
Reputation: 311
from pandas import DataFrame
# s is raw datas
s = ",file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,"
# convert raw data to a table
table = [i.split(',') for i in s.split("\r\n")]
table = [i[:2] for i in table]
# table is like
"""
[['', 'file_05'],
['x data', 'y data'],
['-970.0', '-34.12164'],
['-959.0', '-32.37526'],
['-949.0', '-30.360199'],
...
['-891.0', '-22.29997']]
"""
# save to output.csv file
import csv
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(table)
# Save to DataFrame df
from pandas import DataFrame
df = DataFrame (table[2:],columns=table[1][:2])
print(df)
Upvotes: 0