Mark K
Mark K

Reputation: 9348

To remove spaces from df.to_markdown()

By reading text files into data frame, I want to add a new column based on the values in the existing columns, i.e. adding the numeric values from 'Stock' and 'Delivery'.

The problem is, the original data (from data supplier), was generated from "df.to_markdowns()".

Seems I can't remove the white spaces.

ds = pd.read_csv("C:\\TEMP\\ff.txt", sep="|", header = 0, skipinitialspace=True)
ds.columns = ds.columns.str.strip()

df['new'] = ds['Stock'] + ds['Delivery']

print (df)

enter image description here

What would be the way to handle such case?

By the way, this simulates the text file creation from "df.to_markdown()"

import pandas as pd

data = {'Price': [59,98,79],
'Stock': [53,60,60],
'Delivery': [11,7,6]}
df = pd.DataFrame(data)

with open("C:\\TEMP\\ff.txt", 'a') as outfile:
    outfile.write(df.to_markdown() + "\n")

outfile.close

Upvotes: 0

Views: 692

Answers (2)

Kurt Kline
Kurt Kline

Reputation: 2069

This should do what you need.

ds = pd.read_csv(
    "C:\\TEMP\\ff.txt",
    sep="|",
    skiprows=[1],
    skipinitialspace=True
)
ds.columns = ds.columns.str.strip()
ds = ds.loc[:, ["Price", "Stock", "Delivery"]]
ds['new'] = ds['Stock'] + ds['Delivery']

print(ds)

output

   Price  Stock  Delivery  new
0     59     53        11   64
1     98     60         7   67
2     79     60         6   66

skiprows=[1] skips the row at index 1, which is the row with the --------:

With this row removed from the dataframe, pandas automatically interprets the Price, Stock, and Delivery columns as integers, which allows the statement ds['new'] = ds['Stock'] + ds['Delivery'] to work as expected.

Upvotes: 1

Lukas S
Lukas S

Reputation: 3593

This works on the example you have provided:

pd.read_csv("~/Downloads/ff.txt", sep=r"\s*\|\s*", engine="python", skiprows=[1])[["Price", "Stock", "Delivery"]]

If you want something else I suggest you provide an example for it.

Upvotes: 1

Related Questions