Sheldon
Sheldon

Reputation: 604

Convert csv file with python

I'm new into python, does somebody have an idea what would be a good approach? I could just script it, but it's probably faster to use a package.

I have this .csv file (gigabytes large):

name,   value,  time
A,   1, 10
B,   2, 10
C,   3, 10
C,   3, 10 (should ignore duplicates, or non complete (A,B,C) entries
A,   4, 12 (should be sorted by time, this entry should be at the end, after time==11)
B,   5, 12
C,   6, 12
B,   7, 11 (order of A,B,C might be different)
C,   8, 11
A,   9, 11

convert it to a new .csv file containing:

time,   A,  B,  C
10, 1,  2,  3
11, 9,  7,  8
12, 4,  5,  6

Upvotes: 2

Views: 88

Answers (2)

Divyang Vashi
Divyang Vashi

Reputation: 143

Since I can't comment I would like to add to @jezrael answer that you would also want to drop incomplete or NaN values. By using df.dropna

import numpy as np
import pandas as pd
A = 'a'
B = 'b'
C = 'c'
df = pd.DataFrame([[A,   1, 10],
                [B,   2, 10],
                [C,   3, 10],
                [C,   3, 10],
                [A,   4, 12],
                [B,   5, 12],
                [C,   6, 12],
                [B,   7, 11],
                [C,   8, 11],
                [A,   9, 11],
                [np.nan, 10, 0]], columns = ["name","value", "time"])
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)
df = df.pivot('time','name','value')
print(df)

Upvotes: 2

jezrael
jezrael

Reputation: 862511

I think need drop_duplicates with pivot:

df = df.drop_duplicates().pivot('time','name','value')
print (df)
name  A  B  C
time         
10    1  2  3
11    9  7  8
12    4  5  6

Upvotes: 6

Related Questions