Nafisa Anjum Samia
Nafisa Anjum Samia

Reputation: 113

Facing issues while creating a dataframe from a list of string

I have a list with a following format:

lst = ['"column1","column2","column3","column4","column5","column6","column7"',
 '"A",2022/03/11 00:03:08,"55","01","Bob, Pit","Bob",""',
 '"B",2021/04/11 09:13:06,"","","Niel","Arm","02"']

I want to convert it to a dataframe like the following: enter image description here

I tried a lot of ways, but failed to create it. Sometimes I am getting preceding zero error for the "01", or sometimes I got problem with "," in one column.

Here is my simple code:

dt = pd.DataFrame(lst, columns=["data"])

dt = dt["data"].str.split(',', expand=True)
new_header = dt.iloc[0] #grab the first row for the header
dt = dt[1:] #take the data less the header row
dt.columns = new_header #set the header row as the df header
dt

enter image description here

But there are two problems, I dont want to add any double quotes in the final dataframe or in the csv file, insted of "A", I only want A, and another problem is I want Bob, Pit in one column.

I dont know if there is any easier way to do that, please let me know.

Thanks in advance.

Upvotes: 0

Views: 51

Answers (1)

mozway
mozway

Reputation: 260300

I would use io.StringIO and the DataFrame constructor:

from io import StringIO

lst = ['"column1","column2","column3","column4","column5","column6","column7"',
       '"A",2022/03/11 00:03:08,"55","01","Bob, Pit","Bob",""',
       '"B",2021/04/11 09:13:06,"","","Niel","Arm","02"']

df = pd.read_csv(StringIO('\n'.join(lst)), sep=',')

Output:

  column1              column2  column3  column4   column5 column6  column7
0       A  2022/03/11 00:03:08     55.0      1.0  Bob, Pit     Bob      NaN
1       B  2021/04/11 09:13:06      NaN      NaN      Niel     Arm      2.0

Upvotes: 2

Related Questions