Reputation: 169
pandas df with 11 columns need to modify first 3 columns using regex and add a new column with this modified column and us this for downstream concatenation, something like this I need to keep the element as is of these columns and make it a unique string
column1 column2 column3 column4 ...column 11
need to do this new_col = column1:column2-column3(column4)
and make this new column,
column1 column2 column3 newcol column4 ...column 11
I could do this using simple python one line, not sure what is the syntax for pandas
l = cols[0] + ":" + cols[1] + "-" + cols[2] + "(" + cols[5] + ")"
Upvotes: 0
Views: 885
Reputation: 18625
Based on an answer that was recently deleted, this works fine:
df1 = pd.DataFrame({
'chrom': ['a', 'b', 'c'],
'start': ['d', 'e', 'f'],
'end': ['g', 'h', 'i'],
'strand': ['j', 'k', 'l']}
)
df1['unique_col'] = df1.chrom + ':' + df1.start + '-' + df1.end + '(' + df1.strand + ')'
It sounds like your original dataframe may not contain strings. If it contains numbers, you need something like this:
df1 = pd.DataFrame({
'chrom': [1.0, 2.0],
'start': [3.0, 4.0],
'end': [5.0, 6.0],
'strand': [7.0, 8.0]}
)
df1['unique_col'] = (
df1.chrom.astype(str) + ':'
+ df1.start.astype(str) + '-' + df1.end.astype(str)
+ '(' + df1.strand.astype(str) + ')'
)
Upvotes: 1
Reputation: 294488
consider the dataframe df
np.random.seed([3,1415])
df = pd.DataFrame(np.random.choice(a, (5, 10))).add_prefix('col ')
print(df)
col 0 col 1 col 2 col 3 col 4 col 5 col 6 col 7 col 8 col 9
0 Q L C K P X N L N T
1 I X A W Y M W A C A
2 U Z H T N S M E D T
3 N W H X N U F D X F
4 Z L Y H M G E H W S
Construct a custom format
function
f = lambda row: '{col 1}:{col 2}-{col 3}({col 4})'.format(**row)
And apply to df
df.astype(str).apply(f, 1)
0 L:C-K(P)
1 W:A-C(A)
2 W:H-X(N)
3 E:H-W(S)
4 Y:E-P(N)
dtype: object
Add a new column with assign
df.assign(New=df.astype(str).apply(f, 1))
# assign in place with
# df['New'] = df.astype(str).apply(f, 1)
col 0 col 1 col 2 col 3 col 4 col 5 col 6 col 7 col 8 col 9 New
0 Q L C K P X N L N T L:C-K(P)
1 I X A W Y M W A C A X:A-W(Y)
2 U Z H T N S M E D T Z:H-T(N)
3 N W H X N U F D X F W:H-X(N)
4 Z L Y H M G E H W S L:Y-H(M)
Or you can wrap this up into another function that operates on pd.Series
. This requires that you pass the columns in the correct order.
def u(a, b, c, d):
return a + ':' + b + '-' + c + '(' + d + ')'
df.assign(New=u(df['col 1'], df['col 2'], df['col 3'], df['col 4']))
# assign in place with
# df['New'] = u(df['col 1'], df['col 2'], df['col 3'], df['col 4'])
col 0 col 1 col 2 col 3 col 4 col 5 col 6 col 7 col 8 col 9 New
0 Q L C K P X N L N T L:C-K(P)
1 I X A W Y M W A C A X:A-W(Y)
2 U Z H T N S M E D T Z:H-T(N)
3 N W H X N U F D X F W:H-X(N)
4 Z L Y H M G E H W S L:Y-H(M)
Upvotes: 2
Reputation: 16099
You should just be able to do it with the same syntax you posted as long as all of the columns contain strings.
You can also use the Series.str.cat
method.
df['new_col'] = cols[0].str.cat(':' + cols[1] + '-' + cols[2] + '(' + cols[5]+ ')')
Upvotes: 3