First of all, it is important to note that this isn’t supposed to be a benchmark. The results of this test are worth what they are, it only means that PHP performed better than perl for this particular program.
For my thesis, I need to analyse large sets of data. The data is stored in the DataSeries format, which is a format developed by HP Labs specially for these type of things. I need to do several things with the data, so I created a script to do some basic analysis.
I had several options: I could write a shell script, or choose php or perl or something else instead. I realized that writing a shell script for this would be very complex, so I pondered between PHP and Perl. My feeling is that PHP is more suitable for Web and Perl is more suitable for sysadmin tasks, parsing, etc. I should choose Perl then, but my knowledge of Perl is very very basic, so I would need to learn it first. Unfortunately I am running against time, so I ended up choosing PHP since it would be very easy for me to write the script.
So I wrote a first version of the PHP script. The script is not optimized whatsoever, but works as expected. The problem with a large dataset is that its parsing takes ages, and my first run took ages. I started wondering if a Perl equivalent would be a lot faster than the script I wrote, so I asked a friend to write an equivalent script in Perl. He wrote it and I ran both scripts at the same time on the same server. My friend noticed later that he had left an extra instruction in the main loop that doesn’t exist in the PHP version. Anyway, both scripts were already running so I didn’t abort the run. The results were quite surprising. The PHP version took 551m56.349s and the Perl equivalente took 712m16.792s.
The way I ran the scripts and the respective output is followed:
$ time nfsdsanalysis -Z common archive/lindump_total.ds | ./stats_basic.php > stats_basic.txt
$ time nfsdsanalysis -Z common archive/lindump_total.ds | perl stats_basic.pl > stats_basic1.txt
The file lindump_total.ds is a 80Gb file. The output of nfsdsanalysis (what is piped to the script) is something like this:
# Extent, type='Trace::NFS::common'
packet_at source source_port dest dest_port is_udp is_request nfs_version transaction_id op_id operation rpc_status payload_length record_id
1253831523212739 3a163121 790 01c633c7 2049 TCP request V3 21ff6e38 3 lookup null 56 0
1253831523212743 3a163121 790 01c633c7 2049 TCP request V3 21ff6e38 3 lookup null 56 1
1253831523212746 3a163121 897 01c633c7 2049 TCP request V3 2eff9a5e 1 getattr null 36 2
1253831523212748 3a163121 897 01c633c7 2049 TCP request V3 2eff9a5e 1 getattr null 36 3
1253831523214877 2a2622c2 2049 1a264421 790 TCP response V3 2ffdae28 3 lookup 0 216 4
1253831523214886 2a2622c2 2049 1a264421 897 TCP response V3 2ffca15e 1 getattr 0 88 5
Some people asked me to run the scripts isolated, i.e., not in paralel like last time. I got optimized versions from several people, and I even got some versions in other languages like python and C.
Apparently, the Perl version was so slow due some serious performance bug with regards to list assignment. Thanks to Pedro Figueiredo for the tip. Just by installing 5.10.1 I got a 37% performance improvement. Even though the improvements were significative, Perl still performed in last.
Below you can see the results of the runs of the several optimized scripts in different languages. The results are ordered by run time, being the first one the fastest one and the last one the slowest one:
C Version (By Jose Celestino):
$ time nfsdsanalysis -Z common archive/lindump_total.ds | ./stats_basic > stats_basic4.txt
PHP Version (Optimized by Diogo Neves, and modified by me since there were several bugs):
$ time nfsdsanalysis -Z common archive/lindump_total.ds | ./stats_basic_optimized.php >stats_basic5.txt
Python Version (by Andre Cruz):
$ time nfsdsanalysis -Z common archive/lindump_total.ds | python stats_basic.py > stats_basic3.txt
Perl Version (Original by Carlos Pires, Optimized version by Joao Pedro):
$ time nfsdsanalysis -Z common archive/lindump_total.ds | perl stats_basic_optimized.pl > stats_basic2.txt
Some already asked me why the user time is greater than the real time. Keep in mind that the server where I ran these scripts has 8 cores and that is the reason for it.
It really surprised me that Perl performed the worst, I wasn’t really expecting it. I also ran PHP without APC and the results were similar.
The second semester was consisted of three courses, two core courses and one free elective. I picked the following courses:
You can take a look at the syllabus for each course here. I can’t tell for sure which semester was tougher, the 1st or this one. I guess this 2nd semester involved more work, but you are more used to it, so you can handle things better.
Distributed Systems is a free elective. We had the option to choose either Distributed Systems or Intrusion Tolerance. I picked Distributed Systems. This is a pure Distributed Systems course. We covered RPC, High Availability, Clock Synchronization, Replication, Mutual Exclusion and Groups, Transactions, Corba, J2EE, Distributed Filesystems, Distributed Shared Memory, Distributed Mutual Exclusion, Load Balancing, Security, Naming, Peer-to-Peer Systems and the Google File System. I have never had a distributed systems course before, the closest that I had was parallel computing during my undergrad. It was interesting to study these topics, specially because I work with distributed systems for over 10 years. For this course, we only had one big project for the duration of the semester. In the beginning of the semester you had to pick up a team and a project and throughout the semester build it. At the end, you had to present it to the class. My project was dumpFS, a distributed storage solution written in Erlang. This class was a remote class, taught from Carnegie Mellon.
Network Security was also a fun course. I had the pleasure to attend this class taught by one of the best professors I have ever had in my entire life. In my opinion, it was the most difficult course of this semester, but professor Perrig really teaches you how to think in terms of Security. You can memorize the entire textbook and get an F on the exam. To get a good grade you need to analyze and think (very fast). We covered a lot of recent topics, such as the the MD5-collision attack from earlier this year, or the Pakistani Youtube BGP attack from last year. We had two mini-projects and one research project. The first miniproject was very fun, it consisted in writing a blind TCP reset attack and a DNS poisoning attack (in C). The second miniproject was not so fun, it consisted in implementing an Ad Hoc Routing Attack (DSR & Ariadne) using the simulator ns-2 (in C++). The problem is that we spent more time finding out the dark details of ns-2 than thinking about the attack itself. We complained about this in the course feedback and next year professor Perrig is going to skip this miniproject. See? feedback is useful sometimes. The research project was supposed to aim high, the best project would be submitted to a conference. Mine was about secure DNS and you can check it out by clicking here. This class was a remote class, taught from Carnegie Mellon.
Secure Software Systems was an interesting class. We covered a lot of fun things, such as Buffer Overflows, Input validation (SGDB, Web race conditions), Randomness and determinism, Client side security, DoS, Auditing Tools, Buffer Overflow Defenses, Static Analysis, Attack Injection, Assurance & Certification, Virtualization and Security. We had 3 projects. The first one consisted in finding Buffer Overflows in one SMTP server software and exploit one of its vulnerabilities, and to find and fix some SQL injections and XSS in a Web app. The second project involved Fuzzing and the third one involved using Static Analysis tools. This class was a local class, taught by a professor from FCUL.
Even though that I am now more used to the pace and I don’t stress out so easily, it is still not possible to have a social life. I guess I was more relaxed in the first half of the semester but the second half was crazy, specially the last couple of weeks.
In my third semester, i.e. the Summer semester, I am supposed to write my Thesis. I am going to spend this semester in Pittsburgh since my Thesis Advisor is Greg Ganger, the director of PDL. My Thesis is about Storage. When I return, I will write something about it.
Enjoy the Summer!carnegie mellon, cmu, graduate, masters, MSIT IS, msitis, security]]>
When I went to school, my math teacher said, “If there’s an X in several different parts in the same equation, then all the Xs mean the same thing.” That’s how we can solve equations: if we know that X+Y=10 and X-Y=2, then X will be 6 and Y will be 4 in both equations.
But when I learned my first programming language, we were shown stuff like this:
X = X + 1
Everyone protested, saying “you can’t do that!”. But the teacher said we were wrong, and we had to unlearn what we learned in math class. X isn’t a math variable: it’s like a pigeon hole/little box…
In Erlang, variables are just like they are in math. When you associate a value with a variable, you’re making an assertion – a statement of fact. This variable has that value. And that’s that.
In “Armstrong, Joe; Programming Erlang; The Pragmatic Programmers; 2007“curiosity, erlang, variables]]>
By convention, this function increases at a rate equal to 9192631770 times the period of the radiation emitted by the transition between two hyperfine levels of the ground state atomic cesium 133, a time unit which people have agreed to call second,
In “Veríssimo P, Rodrigues L.; Distributed Systems for System Architects; KAP; 2001“curiosity, time]]>
The researchers were able to create a rogue Certification Authority certificate. That means they have a valid CA certificate, or that they can create any certificate they want for any site. No one tells me that a crime organization wasn’t able to do the same, and if they were, it doesn’t really matter that RapidSSL stopped using MD5 or not. In theory, RapidSSL would need to revoke its Root Certificate to make sure the problem was solved. The problem is that each certificate contains a URL so the browser can check if the certificate was revoked or not. The researcher’s rogue CA certificate had very limited space and it was impossible to include such a URL, which means that by default both Internet Explorer and Firefox are unable to find a revocation server to check their certificate against. Basically it’s up to the Browser vendors to solve the problem permanently by stop accepting certificates that use MD5 for example.
SSL is subject to many types of attacks, specially Man-in-the-Middle attacks. Users usually ignore SSL warnings so they’ll most likely not notice a Man-in-the-Middle attack. One way to be more protected is to install Perspectives, a Firefox plugin, developed by a couple of grad students from Carnegie Mellon University, that monitors the certificates used in the sites you visit, and warns you if the certificate has changed.
So let’s imagine you want to login on your Homebanking to make some wire transfers (or any other site that uses SSL). Here is a list that will make your SSL browsing safer:
man in the middle, md5 collision, perspectives, security, ssl, ssl attack]]>
- Make a bookmark of your Homebanking. Double check that the URL is correct.
- Install Perspectives
- If your browser is running, please quit it and run it again (so it’s a fresh run).
- Go to your bookmarks and click on the Homebanking bookmark. DO NOT load any webpage before the Homebanking one. Make sure the Homebanking is the first page loaded.
- Make sure Perspectives says the Homebanking site is safe
- Now it is safer to use the Homebanking. You can do whatever you want to do there now.
Here’s a list of optional parameters:
If you want the local holiday for your city, please let me know and I’ll add it to the calendar.
Hint: In order to have your calendar always up-to-date you should not use the “ey” parameter, since its default is
Check out http://www.anti-captcha.com/ or the google translation here since the original site is in Russian. Impressive that they even offer SLAs.captcha, security]]>
First of all, this is a Dual Program (MSIT-IS from CMU and Mestrado em Segurança Informatica from FCUL). The entire program is held at FCUL in Lisbon, although a lot of the courses are lectured from Carnegie Mellon. You also have the option to go to Pittsburgh for the Summer Semester to write your Thesis.
The program started the last week of August ’08. The portuguese students were invited to go to Pittsburgh for an orientation session and for the first week of classes. There, we got familiar with the campus, school procedures and we had a taste of what is to be a student at CMU.
After that week we returned to Lisbon, where we attended classes in a high-tech classroom prepared for video-conferencing with Carnegie Mellon at FCUL. All remote classes were live and interactive. Students in Portugal could see students at CMU (and vice-versa), they could interrupt the class and ask questions, etc. The experience was quite pleasant and it’s pretty much like we were there.
In regards to the courses, the first semester was consisted of three core courses:
You can take a look at the syllabus for each course here. All the three courses were a lot of fun, … and very intensive.
In Fundamentals of Embedded Systems we covered the ARM processor, which is the most popular 32-bit processor for embedded devices (being the iPhone one of examples). We started by learning Assembly for ARM processors, then we learned how to optimize code (both in Assembly and C). The lab projects were group projects and C/Assembly based. For the lab projects we used an X-Board powered by an X-Scale processor from Intel. From my understanding next year they’ll be using Gumstix which will certainly make the lab projects even more interesting. Then we learned about Exception Handling and SWIs; Linking and Loading; Flash memory; Memory-mapped I/O, Polling and Timers; Serial Communication; Interrupts; Processes and Scheduling; Concurrency; Synchronization and Deadlocks; Real-time (RT) operating systems & Scheduling algorithms; and Memory Protection and Virtual Memory. The last lab project was to implement parts of an Operating System, like the scheduler, semaphores and timer (both for non RT and RT). This class was a remote class, taught from Carnegie Mellon.
In Packet Switching and Computer Networks we were asked a question in the first lecture: Explain the Journey of a Packet. The goal of the course was to be able to answer that question. The covered topics were SONET; Transport Network Protection & Restoration (SONET Rings); Switching Architecture; Nonblocking Switch Architecture; Point-to-Multipoint Switch; ATM Overview; Network Traffic Control; MPLS; RSVP & RSVP-TE; and GMPLS. The lab projects were individual projects and Java based. Basically we needed to implement some networking protocols such as SONET setup and transport; Queueing/Scheduler algorithms; Label Switching; and Constraint based routing. The last project was actually very interesting, where we needed to design the core network for an ISP, “buy” the equipment, etc. Unfortunately the timing for the last project was very bad (overlapped with the finals week), so most of the students didn’t finish the project. Besides the lab projects, we also had weekly paper reviews and problem sets. This class was also a remote class, taught from Carnegie Mellon.
In Introduction to Computer Security we covered Fundamentals Security Concepts; Security Paradigms (like Cryptography Algorithms, Digital Cash, Authentication types, etc); Models of Distributed Secure Computing (like Classes of Attacks, Authorization, Secure Channels, TCBs, Authentication, Firewalls, etc); and Secure Systems and Platforms (like SSL, VPNs, PGP, IPSec, etc). We had 3 group projects and 3 individual projects. For the group projects we needed to implement things like hardening a Linux server, deploy a Firewall, and deploy an Intrusion Detection System. The individual projects were reading assignments where we had to read a given chapter from a book and write a critical report about it. Besides the projects, we also had weekly problem sets. This class was a local class, taught by a professor from FCUL.
To conclude, I’m loving the program so far. I was told in the beginning that I needed to dedicate 60 hours per week for this program, but it’s almost impossible to get good results if you only dedicate 60 hours per week. Most of the students (including me) live for this program. Since we get up every day until we go to sleep we do nothing but work for the program and eat. Forget your social life for 16 months if you plan to get this Master. But I can assure you that it is a lot of fun and rewarding.
For the next semester, which will start tomorrow, I’m enrolled in the courses Secure Software Systems (locally), Network Security (VC from CMU) and Distributed Systems (VC from CMU). I’ll write another review once this semester is over.carnegie mellon, cmu, graduate, masters, MSIT IS, msitis, security]]>
The fix ended up being this:
nuno@nuno-macbook:~/Library$ mv Lockdown Lockdown.orig
Hope it helps someone. Now I need to get back to work. See you before Xmas.iphone, itunes, sync]]>