Sara_Iva
Sara_Iva

Reputation: 19

Convert an array of variable length strings to dataframe

I have an array of variable length strings in each row, where the column labels are indicated in the string content. However, the columns are variable in size and may be large in size. How do I convert this format to dataframe? For example

Input:

np.array([['Hour: 1', 'Name: EVENT_1', 'Column1: A'],
      ['Hour: 2', 'Name: EVENT_2', 'Column1: B','Column2: BCX'],
      ['Hour: 2', 'Name: EVENT_2', 'Column1: C','Column3: BCY','Column4: BCE'],
      ['Hour: 4', 'Name: EVENT_4','Column1: D',  'Column3: BCZ'],
      ['Hour: 5','Name: EVENT_5','Column2: BC', 'Column99: BCR' ,'Column100: BCA']
     ])

expected output:

Hour |  Name     |Column1|Column2|Column3|Column4...Column99|Column100
1    |  EVENT_1  |  AA   | BCX   |       |       ...        | 
2    |  EVENT_2  |  BQ   |       |       | BCE    ...       | 
3    |  EVENT_3  |  CW   |       | BCY   |       ...        | 
4    |  EVENT_4  |  DF   |       | BCZ   |       ...        |   
5    |  EVENT_5  |       | BC    |       |       ...    BCR |   BCA

Upvotes: 1

Views: 130

Answers (1)

Ernesto
Ernesto

Reputation: 609

Ola and welcome to Stack Overflow

AS @jirassimok mentioned you need to iterate to construct a dictionary. Here is a piece of code that could help:

#Array to collect all rows
new_array=[]
#For every row
for r in rows:
    #Array to collect new row
    new_row={}
    #For every "column: value" pair convert to dictionary
    for c_v in r:
        #Add all values to their column
        new_row[c_v.split(': ')[0]]=c_v.split(': ')[1]
    new_array+=[new_row]
pd.DataFrame(new_array)   

If you really need to save space, you could convert the above to a single line using dictionary and list comprehension

pd.DataFrame([{x.split(': ')[0]:x.split(': ')[1] for x in r} for r in rows]) 

Upvotes: 1

Related Questions