Matt McCormick: Python Sets to Align CSV Columns

From recent work, I've found Python set operations to be easier than using Excel VLOOKUP or MATCH formulas. Here are a few uses:

Visually align two columns from a CSV file, adding the phrase "--MISSING--" into columns where values do not match.


SOMEFILE = '/path/to/some/file.csv'


with open(SOMEFILE, 'r') as csvfile:

                filereader = csv.reader(csvfile, delimiter=',',quotechar='"')

                for row in filereader:

                        set_old.add(row[0])  #column 0

set_new.add(row[1]) #column 1


total = sorted(set_old|set_new)

list1 = [x if x in set_old else "--MISSING--" for x in total]

list2 = [x if x in set_new else "--MISSING--" for x in total]



for i in range(0, len(list1)):

                print i, "\t", list1[i], "\t", list2[i]

You can follow this up and pipe the output to "column -t" command in linux to produce an easy to read list.


python script.py | column -t

For example

Column0 Column1
1 2
2 3

would become:

Column0 Column1
1 --Missing--
2 2
--Missing-- 3

It is very easy to pull keys from an existing array or dictionary and perform set operations on them such as comparing lists of users in access or configuration files.

for user in new_users.difference(old_users):
         print user

Here are some other interesting set functions:

A = set(['1','2','3','4'])
B = set(['3','4','5','6'])
C = set(['1','2','3','4','5','6','7','8','9','10'])
D = set(['1','2'])

A.issubset(B)
False

A.issubset(C)
True

A.issuperset(B)
False

A.issuperset(D)
True

A.intersection(B)
['3','4']

A.union(B)
['1','2','3','4','5','6']

A.difference(B)
['1','2']

A.symmetric_difference.B
['1','2','5','6']

Have you used Python sets in Geospatial applications? Let me know how in the comments.

https://docs.python.org/2/library/sets.html

Matt McCormick

Thursday, November 20, 2014

Python Sets to Align CSV Columns

No comments:

Post a Comment