Tech Tips

How to extract image and text from PDF

1. Install poppler-utils

2. To extract original embedded images:

$ pdfimages -j <file.pdf> <to_dir>

3. To extract text:

$ pdftotext -j <file.pdf>

Tech Tips

How to determine which repo a RHEL/CentOS package belongs to

I thought this command should show me which repo the package ‘htop’ belongs to, but I was wrong:

[root@server ~]# yum info htop
Loaded plugins: rhnplugin, security
Installed Packages
Name : htop
Arch : x86_64
Version : 0.8.3
Release : 1.el5
Size : 136 k
Repo : installed 
Summary : Interactive process viewer
License : GPL+
Description: htop is an interactive text-mode process viewer for...

Notice that it merely indicates ‘installed’ for the ‘Repo’ field? Not very helpful here. I wonder why yum works this way.

To find out which repo ‘htop’ comes from, use this command instead:

[root@server ~]# yum provides `which htop`
Loaded plugins: rhnplugin, security
htop-0.8.3-1.el5.x86_64 : Interactive process viewer
Repo : epel
Matched from:
Filename : /usr/bin/htop

htop-0.8.3-1.el5.x86_64 : Interactive process viewer
Repo : installed
Matched from:
Other : Provides-match: /usr/bin/htop

See ‘epel’ there? Bingo!

Tech Tips

How to identify package of particular file and verify its integrity

Let’s say we need to:

  1. Find out which package contains the file /bin/su in Linux
  2. Verify if the file is untainted (from package) or changed in some manner

For RPM (Fedora, Red Hat, CentOS):

$ rpm -q -f /bin/su
$ rpm -V coreutils-5.97-34.el5

For DEB (Debian, Ubuntu):

$ dpkg -S /bin/su
login: /bin/su
$ debsums -s -a login