wkwkwk
wkwkwk

Reputation: 320

Create dataframe from a string Python

How do I create a dataframe from a string that look like this (part of the string)

,file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,

Trying to make a dataframe that look like this

enter image description here

Upvotes: 1

Views: 550

Answers (3)

bb1
bb1

Reputation: 7863

In the code below s is the string:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(s)).dropna(axis=1)
df.rename(columns={df.columns[0]: ""}, inplace=True)

By the way, if the string comes from a csv file then it is simpler to read the file directly using pd.read_csv.

Edit: This code will create a multiindex of columns:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(s), header = None).dropna(how="all", axis=1).T
df[0] = df.loc[1, 0]
df = df.set_index([0, 1]).T

Upvotes: 2

Joe Ferndz
Joe Ferndz

Reputation: 8508

Looks like you want a multi-level dataframe from the string. Here's how I would do it.

  1. Step 1: Split the string by '\r\n'. Then for each value, split by ','
  2. Step 2: The above step will create a list of list. Element #0 has 4 items and element #1 has 2 items. The rest have 3 items each and is the actual data
  3. Step 3: Convert the data into a dictionary from element #3 onwards. Use values in element #2 as keys for the dictionary (namely x data and y data). To ensure you have key:[list of values], use the dict.setdefault(key,[]).append(value). This will ensure the data is created as a `key:[list of values]' dictionary.
  4. Step 4: Create a normal dataframe using the dictionary as all the values are stored as key and values in the dictionary.
  5. Step 5: Now that you have the dictionary, you want to create the MultiIndex. Convert the column to MultiIndex.

Putting all this together, the code is:

import pandas as pd

text = ',file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,'

line_text = [txt.split(',') for txt in text.split('\r\n')]

dct = {}

for x,y,z in line_text[2:]:
    dct.setdefault(line_text[1][0], []).append(x)
    dct.setdefault(line_text[1][1], []).append(y)

df = pd.DataFrame(dct)

df.columns = pd.MultiIndex.from_tuples([(line_text[0][i],line_text[1][i]) for i in [0,1]])

print (df)

Output of this will be:

              file_05
   x data      y data
0  -970.0   -34.12164
1  -959.0   -32.37526
2  -949.0  -30.360199
3  -938.0   -28.74816
4  -929.0   -27.53912
5  -920.0   -25.92707
6  -911.0   -24.31503
7  -900.0   -23.64334
8  -891.0   -22.29997

Upvotes: 1

William Mou
William Mou

Reputation: 311

  1. You should convert your raw data to a table with python.
  2. Save to csv file by import csv package with python.


from pandas import DataFrame

# s is raw datas
s = ",file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,"

# convert raw data to a table
table = [i.split(',') for i in s.split("\r\n")]
table = [i[:2] for i in table]

# table is like
"""
[['', 'file_05'],
 ['x data', 'y data'],
 ['-970.0', '-34.12164'],
 ['-959.0', '-32.37526'],
 ['-949.0', '-30.360199'],
  ...
 ['-891.0', '-22.29997']]
 """

# save to output.csv file 
import csv
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(table)
    
# Save to DataFrame df
from pandas import DataFrame
df = DataFrame (table[2:],columns=table[1][:2])
print(df)

Upvotes: 0

Related Questions