Reputation: 9348
By reading text files into data frame, I want to add a new column based on the values in the existing columns, i.e. adding the numeric values from 'Stock' and 'Delivery'.
The problem is, the original data (from data supplier), was generated from "df.to_markdowns()".
Seems I can't remove the white spaces.
ds = pd.read_csv("C:\\TEMP\\ff.txt", sep="|", header = 0, skipinitialspace=True)
ds.columns = ds.columns.str.strip()
df['new'] = ds['Stock'] + ds['Delivery']
print (df)
What would be the way to handle such case?
By the way, this simulates the text file creation from "df.to_markdown()"
import pandas as pd
data = {'Price': [59,98,79],
'Stock': [53,60,60],
'Delivery': [11,7,6]}
df = pd.DataFrame(data)
with open("C:\\TEMP\\ff.txt", 'a') as outfile:
outfile.write(df.to_markdown() + "\n")
outfile.close
Upvotes: 0
Views: 692
Reputation: 2069
This should do what you need.
ds = pd.read_csv(
"C:\\TEMP\\ff.txt",
sep="|",
skiprows=[1],
skipinitialspace=True
)
ds.columns = ds.columns.str.strip()
ds = ds.loc[:, ["Price", "Stock", "Delivery"]]
ds['new'] = ds['Stock'] + ds['Delivery']
print(ds)
output
Price Stock Delivery new
0 59 53 11 64
1 98 60 7 67
2 79 60 6 66
skiprows=[1]
skips the row at index 1, which is the row with the --------:
With this row removed from the dataframe, pandas automatically interprets the Price
, Stock
, and Delivery
columns as integers, which allows the statement ds['new'] = ds['Stock'] + ds['Delivery']
to work as expected.
Upvotes: 1
Reputation: 3593
This works on the example you have provided:
pd.read_csv("~/Downloads/ff.txt", sep=r"\s*\|\s*", engine="python", skiprows=[1])[["Price", "Stock", "Delivery"]]
If you want something else I suggest you provide an example for it.
Upvotes: 1