Reputation: 23
I need to iterate through columns in a row of a dataframe to find the first cell (in this given row) that is fully capitalized. I need to repeat this for all rows in the dataframe, finally outputting a dataframe with one column and each row with the corresponding first capitalized string.
As an example - this could be the input dataframe:
+-----+--------+--------+--------+------+
| 0 | 1 | 2 | 3 | 4 |
+-----+--------+--------+--------+------+
| a | Amount | SEQ | LTOTAL | None |
| BBc | LCALC | None | None | None |
| c | LCALC | None | None | None |
| Dea | RYR | LTOTAL | None | None |
+-----+--------+--------+--------+------+
And I would need the output to be the following, in a separate dataframe:
+-------+
| SEQ |
| LCALC |
| LCALC |
| RYR |
+-------+
Upvotes: 1
Views: 216
Reputation: 862511
If need check all columns test values by isupper
and replace non matched values to NaN
s, so possible back filling missing values and seelct first column by iloc
:
df = df.where(df.applymap(lambda x: x.isupper())).bfill(axis=1).iloc[:, 0].to_frame('col')
print (df)
col
0 SEQ
1 LCALC
2 LCALC
3 RYR
EDIT:
Create df1
with columns by position of matched values, so first column are first upper values, ...:
#reshape by stack, None and NaNs columns are removed,
#remove second level of MultiIndex
s = df.stack().reset_index(level=1, drop=True)
#filter only upper values, convert to DataFrame
df1 = s[s.str.isupper()].rename_axis('idx').reset_index(name='val')
#create counter column for count first, second... columns
df1['g'] = df1.groupby('idx').cumcount()
#reshape by pivot and if necessary add non upper rows
df1 = df1.pivot('idx','g','val').reindex(df.index)
print (df1)
g 0 1
0 SEQ LTOTAL
1 LCALC NaN
2 LCALC NaN
3 RYR LTOTAL
first = df1[0].to_frame('col')
second = df1[1].to_frame('col')
print (first)
col
0 SEQ
1 LCALC
2 LCALC
3 RYR
print (second)
col
0 LTOTAL
1 NaN
2 NaN
3 LTOTAL
Upvotes: 6
Reputation: 599
def get_first_upper(row):
for val in row:
if val and val.isupper():
return val
df1.apply(get_first_upper)
Upvotes: 0
Reputation: 1900
You can iterate rows with
column = list()
for _, row in df.iterrows():
for item in row:
if item.isupper():
column.append(item)
break
else:
column.append(numpy.nan)
new_df = pandas.DataFrame(column)
Upvotes: 0
Reputation: 1521
Use the following code to iterate through the row and break at first instance of an all caps cell in the row
import pandas as pd
l=[]
for index,row in df.iterrows():
for i in row:
if(i.isuppercase()):
l.append(i)
break
new_df = pandas.DataFrame(l)
Upvotes: 0