Reputation: 191

reading a csv file into pandas with one column is like a list, create new rows

i have a csv file in the following form.

id  results_numbers results                                                  creation_time
9680    2           [(9394, u'lesbyfaye'), (999, u'Kayts & Koilsby')]        11/10/14 0:23
9690    3           [(5968, u'Jacobsonl'), (47, u'SarHix'), (8825, u'joy')]  12/10/14 0:10

and i would like to read this into pandas and covert to the following:

id     results_numbers  new_id name              creation_time
9680    2               9394   lesbyfaye         11/10/14 0:23
9680    3                999   Kayts & Koilsby   11/10/14 0:23
9690    3               5968   Jacobsonl         12/10/14 0:10
9690    3                 47   SarHix            12/10/14 0:10
9690    3               8825   joy               12/10/14 0:10

Upvotes: 1

Answers (2)

MaxU - stand with Ukraine

Reputation: 210832

you can try do it also without looping:

Original DF:

In [184]: df
Out[184]:
   creation_time    id                                         results  \
0  11/10/14 0:23  9680     [(9394, lesbyfaye), (999, Kayts & Koilsby)]
1  12/10/14 0:10  9690  [(5968, Jacobsonl), (47, SarHix), (8825, joy)]

   results_number
0               2
1               3

Solution:

In [189]: tmp = (pd.DataFrame.from_dict(df.results.to_dict(), orient='index')
   .....:          .stack()
   .....:          .reset_index(level=1, drop=True)
   .....:       )

In [190]: idx = tmp.index

In [191]: new = (pd.DataFrame(tmp.tolist(), columns=['new_id','name'], index=idx)
   .....:          .join(df.drop(['results'], axis=1))
   .....:       )

Result:

In [192]: new
Out[192]:
   new_id             name  creation_time    id  results_number
0    9394        lesbyfaye  11/10/14 0:23  9680               2
0     999  Kayts & Koilsby  11/10/14 0:23  9680               2
1    5968        Jacobsonl  12/10/14 0:10  9690               3
1      47           SarHix  12/10/14 0:10  9690               3
1    8825              joy  12/10/14 0:10  9690               3

Upvotes: 0

Alexander

Reputation: 109528

assuming you can read the dataframe:

df = pd.DataFrame({'id': [9680, 9690], 'results_number': [2, 3], 'results': [[(9394, u'lesbyfaye'), (999, u'Kayts & Koilsby')], [(5968, u'Jacobsonl'), (47, u'SarHix'), (8825, u'joy')]], 'creation_time': ["11/10/14 0:23", "12/10/14 0:10"]})

>>>> pd.DataFrame([[row.id, row.results_number, tup[0], tup[1], row.creation_time] 
                   for _, row in df.iterrows() 
                   for tup in row.results], 
                  columns=['id', 'results_numbers', 'new_id', 'name', 'creation_time'])

     id  results_numbers  new_id             name  creation_time
0  9680                2    9394        lesbyfaye  11/10/14 0:23
1  9680                2     999  Kayts & Koilsby  11/10/14 0:23
2  9690                3    5968        Jacobsonl  12/10/14 0:10
3  9690                3      47           SarHix  12/10/14 0:10
4  9690                3    8825              joy  12/10/14 0:10

EDIT

If you have malformed data, try this:

good_data = []
bad_data = []
for _, row in df.iterrows():
    for n, tup in enumerate(row.results):
        if len(tup) == 2:
            good_data.append([row.id, row.results_number, tup[0], tup[1], row.creation_time])
        else:
            bad_data.append(n, tup)

Upvotes: 1

reading a csv file into pandas with one column is like a list, create new rows

Answers (2)

Related Questions