Wednesday, March 13, 2013

comm: another way to diff

I recently had a need to compare two files of two different versions.
This is usually what you use <enter your favorite diff tool here> for.
However, these files had similar entries (lines) in different places in the file, and most diff tools are not smart enough to cope with that. What I wanted to do is to eliminate those and to be able to compare only the relevant changes.

These specific files had some irrelevant lines (read: comments), which (conveniently enough) begins with a '#' character. So first I stripped the comments from both files:
grep -Ev '#' .config > .config_no_comments

Now that we have two files that contain only the changes we're interested in, we can use comm to compare them.
comm can be used on sorted files only, so we need to sort our two files:
sort -o .config_no_comments_sorted .config_no_comments

Now, suppose I have those two files and I want to see only the unique lines that file1 has (that are not present in file2), I would do:
comm -23 file1 file2

If I want to see only the unique lines that file2 has (that are not present in file1), I would do:
comm -13 file1 file2

And this will output the lines that appear in both files:
comm -12 file1 file2


RTFM for more options.

No comments: