everestial
everestial

Reputation: 7255

How to update the column values and it's corresponding index based on a values of the column?

In the pandas Dataframe of following structure:

mcve_data =

alfa   alfa_id     beta    beta_id
a,c    7           c,de    8
c,d    7           d,f     9
l,mnk  8           c,d     9
j,k    8           d,e     9
tk,l   8           n,k     11

Final expected output:

alfa   alfa_id     beta    beta_id
a,c    7           .      .
c,d    7           d,f     9
.      .           c,d     9
j,k    8           d,e     9
.      .           n,k     11

I wanted to write a function something like (but it hasn't worked properly):

def check_and_convert(mcve_data):
    labels = (l, l + id) for l in mcve_data.columns.values

    def convert(lines):
        for l,id in labels:
            if len(l) > 3:
                l = '.'
                id = '.'
        return l, id

        write this back to the file.

Any suggestions,

Upvotes: 1

Views: 76

Answers (2)

plasmon360
plasmon360

Reputation: 4199

You could use for loop and iterrows(). see below.

import pandas as pd
from StringIO import StringIO

s = """alfa   alfa_id     beta    beta_id
a,c    7           c,de    8
c,d    7           d,f     9
l,mnk  8           c,d     9
j,k    8           d,e     9
tk,l   8          n,k     11
"""

df = pd.read_table(StringIO(s), delim_whitespace = True,  dtype ={'alfa': str, 'alfa_id': str,
                                                                 'beta': str, 'beta_id': str})

# I create a lsit of keys and key index based on '_id' distinction

keys = [i for i in df.columns if 'id' not in i]
key_ids = [i+'_id' for i in keys]

for index, row in df.iterrows():
    for k,kid in zip(keys, key_ids):
        if (len(row[k].split(','))>3 or any([len(i) > 1 for i in row[k].split(',')])):
            df.set_value(index, kid, '.')
            df.set_value(index, k, '.')


print df

results in

  alfa alfa_id beta beta_id
0  a,c       7    .       .
1  c,d       7  d,f       9
2    .       .  c,d       9
3  j,k       8  d,e       9
4    .       .  n,k      11

Upvotes: 1

gereleth
gereleth

Reputation: 2482

You could also skip the inner for loop by using the str accessor to check the length of every value in a column at once:

keys = [k for k in df.columns if not k.endswith('_id')]
for k in keys:
    df.loc[df[k].str.len()>3,[k,k+'_id']] = '.'

Upvotes: 2

Related Questions