Ignoring unnecessary pipes while reading a pipe delimited data using pandas

Question

So I have a .txt file as below:

IndentNo|Date|Short_Desc|PurchaseGroup|IndentType
100223|23.09.2020|"6"3 bendend pipe dia 4.5 m"|GRP_1||
100223|14.05.2021|"IM_13#22D FEMALE PLUG|6#|GRP_2||

I am converting the above into a pandas dataframe. So I am using the following:

lines = []
with open(os.path.join(dir_path_input,f),'r',encoding='utf-8') as f_m:
  f_r = f_m.readlines()
    for l in f_r:
      l = l.replace('
','')
      f_r_s = l.split('|')
      lines.append(f_r_s)
  f_m.close()   
  df = pd.DataFrame(lines)
  df.columns = df.iloc[0]
  df = df.drop(df.index[0])

Using the above method, I am getting the datframe as follows:

IndentNo  Date       Short_Desc                  PurchaseGroup  IndentType
100223  23.09.2020  "6"3 bendend pipe dia 4.5 m"   GRP_1
100223  14.05.2021  "IM_13#22D FEMALE PLUG           6#          GRP_2 ##<----Wrong entry.

As you can see that in the last row, pandas have inserted data not as per expectation. The last row should be like

IndentNo  Date       Short_Desc                  PurchaseGroup  IndentType
100223  23.09.2020  "6"3 bendend pipe dia 4.5 m"   GRP_1
100223  14.05.2021  "IM_13#22D FEMALE PLUG|6#      GRP_2

Is there any better way to read pipe delimited files. Or can the above code snippet be modified to consider the desired output?

Ignoring unnecessary pipes while reading a pipe delimited data using pandas

Answers (1)

Related Questions