In the pandas Dataframe of following structure: mcve_data = alfa alfa_id beta beta_id a,c 7 c,de 8 c,d 7 d,f 9 l,mnk 8 c,d 9 j,k 8 d,e 9 tk,l 8 n,k 11 I want to run a for-loop in each line reading the values from key (alfa and beta) and key_index (alfa_index, beta_index). If the values in the key is more than 3 in length or if any values are more than 1 character long. I want both the key-values and key-index to convert to period . . Final expected output : alfa alfa_id beta beta_id a,c 7 . . c,d 7 d,f 9 . . c,d 9 j,k 8 d,e 9 . . n,k 11 I wanted to write a function something like (but it hasn't worked properly): def check_and_convert(mcve_data): labels = (l, l + id) for l in mcve_data.columns.values def convert(lines): for l,id in labels: if len(l) > 3: l = '.' id = '.' return l, id write this back to the file. Any suggestions,

pythonpandasif-statementfor-loopdataframe

Reputation: 7255

How to update the column values and it's corresponding index based on a values of the column?

In the pandas Dataframe of following structure:

mcve_data =

alfa   alfa_id     beta    beta_id
a,c    7           c,de    8
c,d    7           d,f     9
l,mnk  8           c,d     9
j,k    8           d,e     9
tk,l   8           n,k     11

I want to run a for-loop in each line reading the values from key (alfa and beta) and key_index (alfa_index, beta_index).
If the values in the key is more than 3 in length or if any values are more than 1 character long. I want both the key-values and key-index to convert to period ..

Final expected output:

alfa   alfa_id     beta    beta_id
a,c    7           .      .
c,d    7           d,f     9
.      .           c,d     9
j,k    8           d,e     9
.      .           n,k     11

I wanted to write a function something like (but it hasn't worked properly):

def check_and_convert(mcve_data):
    labels = (l, l + id) for l in mcve_data.columns.values

    def convert(lines):
        for l,id in labels:
            if len(l) > 3:
                l = '.'
                id = '.'
        return l, id

        write this back to the file.

Any suggestions,

Upvotes: 1

Answers (2)

plasmon360

Reputation: 4199

You could use for loop and iterrows(). see below.

import pandas as pd
from StringIO import StringIO

s = """alfa   alfa_id     beta    beta_id
a,c    7           c,de    8
c,d    7           d,f     9
l,mnk  8           c,d     9
j,k    8           d,e     9
tk,l   8          n,k     11
"""

df = pd.read_table(StringIO(s), delim_whitespace = True,  dtype ={'alfa': str, 'alfa_id': str,
                                                                 'beta': str, 'beta_id': str})

# I create a lsit of keys and key index based on '_id' distinction

keys = [i for i in df.columns if 'id' not in i]
key_ids = [i+'_id' for i in keys]

for index, row in df.iterrows():
    for k,kid in zip(keys, key_ids):
        if (len(row[k].split(','))>3 or any([len(i) > 1 for i in row[k].split(',')])):
            df.set_value(index, kid, '.')
            df.set_value(index, k, '.')


print df

results in

  alfa alfa_id beta beta_id
0  a,c       7    .       .
1  c,d       7  d,f       9
2    .       .  c,d       9
3  j,k       8  d,e       9
4    .       .  n,k      11

Upvotes: 1

gereleth

Reputation: 2482

You could also skip the inner for loop by using the str accessor to check the length of every value in a column at once:

keys = [k for k in df.columns if not k.endswith('_id')]
for k in keys:
    df.loc[df[k].str.len()>3,[k,k+'_id']] = '.'

Upvotes: 2

How to update the column values and it&#39;s corresponding index based on a values of the column?

Answers (2)

Related Questions

How to update the column values and it's corresponding index based on a values of the column?