Reputation: 7255
In the pandas Dataframe of following structure:
mcve_data =
alfa alfa_id beta beta_id
a,c 7 c,de 8
c,d 7 d,f 9
l,mnk 8 c,d 9
j,k 8 d,e 9
tk,l 8 n,k 11
key
(alfa and beta) and key_index
(alfa_index, beta_index).key
is more than 3
in length or if any values are more than 1 character long. I want both the key-values
and key-index
to convert to period .
.Final expected output:
alfa alfa_id beta beta_id
a,c 7 . .
c,d 7 d,f 9
. . c,d 9
j,k 8 d,e 9
. . n,k 11
I wanted to write a function something like (but it hasn't worked properly):
def check_and_convert(mcve_data):
labels = (l, l + id) for l in mcve_data.columns.values
def convert(lines):
for l,id in labels:
if len(l) > 3:
l = '.'
id = '.'
return l, id
write this back to the file.
Any suggestions,
Upvotes: 1
Views: 76
Reputation: 4199
You could use for loop and iterrows(). see below.
import pandas as pd
from StringIO import StringIO
s = """alfa alfa_id beta beta_id
a,c 7 c,de 8
c,d 7 d,f 9
l,mnk 8 c,d 9
j,k 8 d,e 9
tk,l 8 n,k 11
"""
df = pd.read_table(StringIO(s), delim_whitespace = True, dtype ={'alfa': str, 'alfa_id': str,
'beta': str, 'beta_id': str})
# I create a lsit of keys and key index based on '_id' distinction
keys = [i for i in df.columns if 'id' not in i]
key_ids = [i+'_id' for i in keys]
for index, row in df.iterrows():
for k,kid in zip(keys, key_ids):
if (len(row[k].split(','))>3 or any([len(i) > 1 for i in row[k].split(',')])):
df.set_value(index, kid, '.')
df.set_value(index, k, '.')
print df
results in
alfa alfa_id beta beta_id
0 a,c 7 . .
1 c,d 7 d,f 9
2 . . c,d 9
3 j,k 8 d,e 9
4 . . n,k 11
Upvotes: 1
Reputation: 2482
You could also skip the inner for loop by using the str
accessor to check the length of every value in a column at once:
keys = [k for k in df.columns if not k.endswith('_id')]
for k in keys:
df.loc[df[k].str.len()>3,[k,k+'_id']] = '.'
Upvotes: 2