Reputation: 19
I have an array of variable length strings in each row, where the column labels are indicated in the string content. However, the columns are variable in size and may be large in size. How do I convert this format to dataframe? For example
Input:
np.array([['Hour: 1', 'Name: EVENT_1', 'Column1: A'],
['Hour: 2', 'Name: EVENT_2', 'Column1: B','Column2: BCX'],
['Hour: 2', 'Name: EVENT_2', 'Column1: C','Column3: BCY','Column4: BCE'],
['Hour: 4', 'Name: EVENT_4','Column1: D', 'Column3: BCZ'],
['Hour: 5','Name: EVENT_5','Column2: BC', 'Column99: BCR' ,'Column100: BCA']
])
expected output:
Hour | Name |Column1|Column2|Column3|Column4...Column99|Column100
1 | EVENT_1 | AA | BCX | | ... |
2 | EVENT_2 | BQ | | | BCE ... |
3 | EVENT_3 | CW | | BCY | ... |
4 | EVENT_4 | DF | | BCZ | ... |
5 | EVENT_5 | | BC | | ... BCR | BCA
Upvotes: 1
Views: 130
Reputation: 609
Ola and welcome to Stack Overflow
AS @jirassimok mentioned you need to iterate to construct a dictionary. Here is a piece of code that could help:
#Array to collect all rows
new_array=[]
#For every row
for r in rows:
#Array to collect new row
new_row={}
#For every "column: value" pair convert to dictionary
for c_v in r:
#Add all values to their column
new_row[c_v.split(': ')[0]]=c_v.split(': ')[1]
new_array+=[new_row]
pd.DataFrame(new_array)
If you really need to save space, you could convert the above to a single line using dictionary and list comprehension
pd.DataFrame([{x.split(': ')[0]:x.split(': ')[1] for x in r} for r in rows])
Upvotes: 1