Reputation: 469

Parsing specific columns of CSV in python

so I have this CSV and I would like to do the following:

Original data:

Parsed Data:

So, to put in words, if a column has commas then I want to create a new column with only one value and delete the column which has multiple values.

For example: N2 has I1, I3 and I4. Hence the new data gets 3 columns, containing one value only.

I want to make it dynamic in such a way that all the permutations are reflected. Like in the case of N3 that has 2 places and 2 items.

I am trying to use python's pandas to do this. Some help would be appreciated.

Upvotes: 1

Answers (4)

Serge Ballesta

Reputation: 148880

You can avoid the use of pandas. If you want to stick with the standard csv module, you simply have to split each field on comma (',') and then iterate on the splitted elements.

Code could be assuming the input delimiter is a semicolon (;) (I cannot know what it is except it cannot be a comma):

with open('input.csv', newline='') as fd, open('output.csv', 'w', newline='') as fdout:
    rd = csv.DictReader(fd, delimiter=';')
    wr = csv.writer(fdout)
    _ = wr.writerow(rd.fieldnames)
    for row in rd:
       for i in row['Item'].split(','):
           i = i.strip()
           if len(i) != 0:
               for p in row['Place'].split(','):
                   p = p.strip()
                   if len(p) != 0:
                       for n in row['Name'].split(','):
                           n = n.strip()
                           if len(n) != 0:
                               wr.writerow((n,p,i))

Output is:

Name,Place,Item
N1,P1,I1
N2,P2,I1
N2,P2,I3
N2,P2,I4
N3,P2,I2
N3,P3,I2
N3,P2,I5
N3,P3,I5

Upvotes: 0

Alec Thomas

Reputation: 186

Here is another option:

df['Place'] = df['Place'].str.split(',')
df['Item'] = df['Item'].str.split(',')

exploded = pd.DataFrame([
    a + [p, t] for *a, P, T in df.values
    for p in P for t in T
], columns=df.columns)

And the output:

  Name Place Item
0   N1    P1   I1
1   N2    P2   I1
2   N2    P2   I3
3   N2    P2   I4
4   N3    P2   I2
5   N3    P2   I5
6   N3    P3   I2
7   N3    P3   I5

Upvotes: 1

Mohit Musaddi

Reputation: 143

You can use iterrows() :

df = pd.DataFrame({'Name': ['N1', 'N2', 'N3'], 'Place':['P1', 'P2','P2,P3'], 'Item':['I1,', 'I1,I3,I4', 'I2,I5']})

result = pd.DataFrame()
new_result = pd.DataFrame()

df['Place'] = df['Place'].apply(lambda x: x.strip(','))
df['Item'] = df['Item'].apply(lambda x: x.strip(','))

for a,b  in df.iterrows():
    curr_row = df.iloc[a]
    temp  = ((curr_row['Place'].split(',')))
    for x in temp:
        curr_row['Place'] = x
        result = result.append(curr_row, ignore_index=True)

for a,b  in result.iterrows():
    curr_row = result.iloc[a]
    temp  = ((curr_row['Item'].split(',')))
    for x in temp:
        curr_row['Item'] = x
        new_result = new_result.append(curr_row, ignore_index=True)

Output:

  Name Place Item
0   N1    P1   I1
1   N2    P2   I1
2   N2    P2   I3
3   N2    P2   I4
4   N3    P2   I2
5   N3    P2   I5
6   N3    P3   I2
7   N3    P3   I5

This is the simplest way you can achieve your desired output.

Upvotes: 0

fuglede

Reputation: 18201

You are effectively attempting to take the Cartesian product of each row, then binding the result back into a DataFrame. As such, you could use itertools and do something like

from itertools import chain, product
df_lists = df.applymap(lambda s: s.split(','))
pd.DataFrame(chain.from_iterable(df_lists.apply(lambda row: product(*row), axis=1)), columns=df.columns)

With your example input:

In [334]: df
Out[334]:
  Name  Place      Item
0   N1     P1        I1
1   N2     P2  I1,I3,I4
2   N3  P2,P3     I2,I5

In [336]: df_lists = df.applymap(lambda s: s.split(','))

In [337]: pd.DataFrame(chain.from_iterable(df_lists.apply(lambda row: product(*row), axis=1)), columns=df.columns)
Out[337]:
  Name Place Item
0   N1    P1   I1
1   N2    P2   I1
2   N2    P2   I3
3   N2    P2   I4
4   N3    P2   I2
5   N3    P2   I5
6   N3    P3   I2
7   N3    P3   I5

Upvotes: 0

Parsing specific columns of CSV in python

Answers (4)

Related Questions