Reputation: 113
I have a list with a following format:
lst = ['"column1","column2","column3","column4","column5","column6","column7"',
'"A",2022/03/11 00:03:08,"55","01","Bob, Pit","Bob",""',
'"B",2021/04/11 09:13:06,"","","Niel","Arm","02"']
I want to convert it to a dataframe like the following:
I tried a lot of ways, but failed to create it. Sometimes I am getting preceding zero error for the "01", or sometimes I got problem with "," in one column.
Here is my simple code:
dt = pd.DataFrame(lst, columns=["data"])
dt = dt["data"].str.split(',', expand=True)
new_header = dt.iloc[0] #grab the first row for the header
dt = dt[1:] #take the data less the header row
dt.columns = new_header #set the header row as the df header
dt
But there are two problems, I dont want to add any double quotes in the final dataframe or in the csv file, insted of "A", I only want A, and another problem is I want Bob, Pit in one column.
I dont know if there is any easier way to do that, please let me know.
Thanks in advance.
Upvotes: 0
Views: 51
Reputation: 260300
I would use io.StringIO
and the DataFrame
constructor:
from io import StringIO
lst = ['"column1","column2","column3","column4","column5","column6","column7"',
'"A",2022/03/11 00:03:08,"55","01","Bob, Pit","Bob",""',
'"B",2021/04/11 09:13:06,"","","Niel","Arm","02"']
df = pd.read_csv(StringIO('\n'.join(lst)), sep=',')
Output:
column1 column2 column3 column4 column5 column6 column7
0 A 2022/03/11 00:03:08 55.0 1.0 Bob, Pit Bob NaN
1 B 2021/04/11 09:13:06 NaN NaN Niel Arm 2.0
Upvotes: 2